Show HN: Npm module that makes easy to convert JSON to CSV (opens in new tab)

(npmjs.com)

64 pointskauegimenes9y ago63 comments

63 comments

47 comments · 11 top-level

billforsternz9y ago· 14 in thread

Off topic rant I need to get off my chest. Yesterday I troubleshooted a bizarre problem. New versions of Excel don't necessarily understand the simplest CSV files. Reason: They forgot that the C in CSV stands for comma, and instead expect whatever character happens to be set as the 'delimiter' character in the Windows regional preferences. Here in New Zealand, it seems that character is semicolon instead of comma. For some reason. I think Excel was finished circa 2000 and now they focus mainly on making it worse.

ignoramous9y ago

There was an interesting article abt CSV parsing a few years ago, here's the HN discussion https://news.ycombinator.com/item?id=7796268

To quote the top most comment (by user, slg): "CSV are a headache. Like the article says, RFC4180 doesn't necessarily represent the real world. However sometimes you just have to reject things that aren't spec.

Not too long ago I was struggling with one of these CSV issues and received some good advice from Hans Passant [1] on a Stack Overflow question pertaining to my problem (emphasis mine):

'It is pretty important that you don't try to fix it. That will make you responsible for bad data for a long time. Reject the file for being improperly formatted. If they hassle you about it then point out that it is not RFC-4180 compatible. There's another programmer somewhere that can easily fix this.'

It makes perfect sense in hindsight. If you accept a malformed CSV file, people will expect you to accept any malformed data that has a CSV extension. You are taking on a lot of extra responsibility to cover for the lack of work by another programmer. Odds are they can make a change to fix the problem that takes a fraction of the time it would take you work around it. You just have to raise the issue.

I realize that rejecting bad files isn't really possible in every circumstance. But I have a feeling it is an option more times than you might initially think."

TAForObvReasons9y ago

If you know the file is going to be used in Excel, you can force a delimiter with the `sep` directive. The first characters of the file should be

    sep=,<newline>

sillysaurus39y ago

Bleh. That's so Microsoft.

It took roughly ten seconds to find huge problems with this approach: https://stackoverflow.com/questions/20395699/sep-statement-b...

1 more reply

strictfp9y ago

It also behaves differently if you open the csv from inside excel rather than if you doubleclick the file in explorer.

And we haven't even gotten to file encodings yet.

someSven9y ago

Just for the case, anyone doesn't know. Be carefull with opening CSVs in Excel or just don't do it.

https://www.owasp.org/index.php/CSV_Excel_Macro_Injection

bruth9y ago

Yep. Just don't do it. https://genomebiology.biomedcentral.com/articles/10.1186/s13...

PetahNZ9y ago

Lets not forget when you have "ID" as the first header (ie. col 1 row 1), it assumes the CSV file is SLYK format.

electroly9y ago

The secret to using CSV in Excel is to use Data > From Text instead of opening the CSV file directly. This opens the import wizard which allows you to, among other things, change the separator character. It's still not perfect, but it's much better than opening the CSV file directly.

timvdalen9y ago

Same in Holland.

The 'fun' thing is that Excel for OS X does not do this, it uses commas.

We used to always just generate CSV files with semicolons since most of our clients were using Dutch Excel on Windows. As some of them moved to OS X, we've mostly been guessing what format to use.

wolco9y ago

I'm curious why New Zealand would have a semi-colon as the default. Is there a historical or cultural reason?

bitmedley9y ago

I checked and New Zealand actually uses a comma as the separator, not a semi-colon. You can verify this by going to Control Panel -> Region -> Change to: English (New Zealand) -> Additional settings... -> List separator

A semi-colon is generally used as the default list separator when the region/locale uses a comma as the decimal separator for numbers. For example Dutch (Netherlands) uses a comma for the decimal separator (ex. 3,14) whereas in English (US) we use a decimal point (ex 3.14). If comma were used as the default list separator in such a region then all floating point numbers would need to be quoted (ex. "3,14") which would make the size of the CSV file larger and also make the file less human-readable

1 more reply

billforsternz9y ago

Nothing comes to mind. I think someone might have had a moment of unnecessary inspired creativity. The main point is, using this setting is a mistake for this context. Let CSV be CSV.

amptorn9y ago

> I think Excel was finished circa 2000 and now they focus mainly on

Not breaking established behaviour?

rplnt9y ago

Do you directly open the file? It used to ask during import.

danso9y ago· 5 in thread

In the complex object example, what's the usecase for turning that nested object into a multi-column CSV? I guess the overarching question is, why turn a single object at all into a CSV? It's just as machine/human readable as the JSON version of it, and there's no particular benefit in using either structure as it's not data meant to be processed in bulk.

The complex array usecase is where an opinionated-type of conversion tool is particularly needed, but I wonder why it behaves like this:

   name: 'Robert',
   lastname: 'Miller',
   family: null,
   location: [1231,3214,4214]

lastname,name,family.type,family.name,nickname,location Miller,Robert,,,,1231,3214,4214

Why not have `location_1, location_2, location_3`, instead of having a single location column? The latter implementation makes the data difficult to quickly use in a program (like a spreadsheet).

kauegimenesOP9y ago

This only happends if its a simple array, but if you have something like

   name: 'Robert',
   lastname: 'Miller',
   family: null,
   location: [{city: 'A'}, {city: 'B'}]

you would get 2 csv lines one with location.city A and other with B

c0nfused9y ago

To answer your first question: A whole bunch of non progammer types use csv for everything. SAS and SPSS rule everything around them and they do input from csv. Source: I used to make good money doing the turn x to csv for academics all over.

As to the last, I got nothing. It's how I would have rolled.

robmiller9y ago

Hey, why are you picking on me? :)

kauegimenesOP9y ago

It's possible to implement the `location_1, location_2, location_3` as a optional feature.

But i am not sure what name to use for this option.

traviscj9y ago

Maybe some variant on "Extract array elements to separate columns"? or did you mean what to name the command line option? Maybe a "--list-treatment=" option, which can be "concat" or "separate"?

dc29y ago· 4 in thread

For a <5Kb library, I recommend removing the underscore dependency. You should make this as lightweight as possible if you want a lot of adoption.

stdbrouw9y ago

Converting to CSV is mainly a server-side problem, not something you'd want to do in the browser, so this doesn't really feel like an issue to me.

kauegimenesOP9y ago

Thanks for the tip, you are welcome to submit a PR if you want to =)

d2xdy29y ago

Not OP, but I just submitted a PR removing underscore as a dep. :)

1 more reply

JoshGlazebrook9y ago

Yeah this looks like it's only using helper functions like _.isFunction(), _.isNumber(), and _.extend().

TAForObvReasons9y ago· 3 in thread

The most popular NPM module for JSON to CSV conversion is json2csv: https://www.npmjs.com/package/json2csv https://github.com/zemirco/json2csv

cheapsteak9y ago

Here's a graphical comparison of downloads of popular json to csv conversion libraries -

http://npmcharts.com/compare/json2csv,jsonexport,csv,fast-cs...

/shameless plug

1 more reply

kauegimenesOP9y ago

json2csv require you to tell every header and don't work well with nested objects and require json2csv-stream for large json files.

drunkcatsdgaf9y ago

False, headers are optional. Nested objects are still a problem though :(

1 more reply

codezero9y ago· 3 in thread

Cool, I immediately wondered if you handled a lot of the tricky CSV things like quotes and internal commas, it looks like it does a good job.

Comparing against Numbers (probably a bad comparison?) I am seeing one slightly different result:

If I have a field that is "hehe"

It's encoded as ""hehe""

It looks like Numbers adds the enclosing quotes:

"""hehe"""

I only see enclosing quotes if there's a comma:

"hehe,hehe" -> """hehe,hehe"""

Anyhow, I'm not sure what the "correct" thing to do here is, if there is one, just a heads up!

chc9y ago

CSV is pretty poorly standardized, but if you're going by RFC-4180, Numbers is definitely right, due to the interaction of two rules:

* double quotes aren't allowed inside a field that isn't double-quoted

* double-quotes that do appear in a field have to be escaped by preceding them with another double quote.

So by RFC-4180, I'm pretty sure ""hehe"" shouldn't be possible, and the way to represent "hehe" with the quotes is """hehe""".

codezero9y ago

Cool, yep, thanks for pointing out the RFC. I don't envy anyone making CSV parsers and basically assume they won't work when I need to use them :)

kauegimenesOP9y ago

Here is all the logic related to this behavior: https://github.com/kauegimenes/jsonexport/blob/master/lib/co...

BrandiATMuhkuh9y ago· 2 in thread

Converting JSON to CSV is a usable/good way when analysing NoSQL data. However I'm trying try find since a while a data analysis/query tool for JSON data itself. I saw that mixpanel made a query tool/language called JQL (JavaScript query language) which seems nice. Also RQL (rethink query language) seems nice too. But I can't find a simple tool in which I can import my JSON data and query it. It should be similar to SQLite-browser. Simply import your CSV/JSON file, and query it.

trefn9y ago

(I work at Mixpanel)

Honestly you could probably write a quick pipeline to dump your json data into Mixpanel and then use JQL -- it would be a little hacky but if you have less than a few million rows it shouldn't be too much work (and would still be free at that volume).

kauegimenesOP9y ago

What you think about this syntax https://github.com/mmckegg/json-query?

tambourine_man9y ago· 2 in thread

Just yesterday I was looking for the other way around, CVS->JSON

TeMPOraL9y ago

- Wrap the entire file with { and }

- Wrap each line with [ and ]

- eval the file

;).

tambourine_man9y ago

It wasn't neatly quoted, which I assume is a must for JSON, but yeah, I ended up going on a different path.

kmike849y ago· 1 in thread

From a Python developer point of view, design is a bit surprising - why isn't it created as two separate libraries:

1) csv writer (and reader?) which takes care of all csv dialects crazyness; 2) a library which "flattens" nested objects/arrays?

(1) is not opinionated (besides a few API choices), it just has to be correct; it doesn't make much sense to re-implement (1) everywhere.

(2) can be more opinionated, it is easy to disagree with design choices, there is more room for personal preferences.

For example, in Python there is CSV stdlib module, and for (2) there are libraries like https://github.com/scrapinghub/flatson. Why put both to the same library? Is it something ideomatic in node.js world, with a deeper reason to design libraries this way (e.g. download size), or is it just an oversight?

abritinthebay9y ago

It could be. The question is: why should it be?

Everything you said makes sense if those are the goals, but if they aren't... well, why should they be?

The design you suggest would be done by some module authors for sure, but there's no reason it needs to be that design.

freezer3339y ago· 1 in thread

Nice! If you want to do something similar with Excel xlsx this works: https://www.npmjs.com/package/jsonexcel Not sure how it handles huge arrays though...

kauegimenesOP9y ago

Thanks for sharing, you can also use jsonexport with rowDelimiter: '\t' for a xls compatible output.

madamelic9y ago· 1 in thread

This is a new one. Most people want to go the opposite way.

Either way, neat addition.

nolok9y ago

> Most people want to go the opposite way.

When you want to easily provide something excel/gsuite/... Will take in without using an heavier excel compatible library, CSV can be quite decent.

chid9y ago

There's a similar command in pandas called json_normalize (http://pandas.pydata.org/pandas-docs/stable/generated/pandas...)

j / k navigate · click thread line to collapse

63 comments

47 comments · 11 top-level

billforsternz9y ago· 14 in thread

ignoramous9y ago

There was an interesting article abt CSV parsing a few years ago, here's the HN discussion https://news.ycombinator.com/item?id=7796268

Not too long ago I was struggling with one of these CSV issues and received some good advice from Hans Passant [1] on a Stack Overflow question pertaining to my problem (emphasis mine):

I realize that rejecting bad files isn't really possible in every circumstance. But I have a feeling it is an option more times than you might initially think."

TAForObvReasons9y ago

If you know the file is going to be used in Excel, you can force a delimiter with the `sep` directive. The first characters of the file should be

    sep=,<newline>

sillysaurus39y ago

Bleh. That's so Microsoft.

It took roughly ten seconds to find huge problems with this approach: https://stackoverflow.com/questions/20395699/sep-statement-b...

1 more reply

strictfp9y ago

It also behaves differently if you open the csv from inside excel rather than if you doubleclick the file in explorer.

And we haven't even gotten to file encodings yet.

someSven9y ago

Just for the case, anyone doesn't know. Be carefull with opening CSVs in Excel or just don't do it.

https://www.owasp.org/index.php/CSV_Excel_Macro_Injection

bruth9y ago

Yep. Just don't do it. https://genomebiology.biomedcentral.com/articles/10.1186/s13...

PetahNZ9y ago

Lets not forget when you have "ID" as the first header (ie. col 1 row 1), it assumes the CSV file is SLYK format.

electroly9y ago

timvdalen9y ago

Same in Holland.

The 'fun' thing is that Excel for OS X does not do this, it uses commas.

We used to always just generate CSV files with semicolons since most of our clients were using Dutch Excel on Windows. As some of them moved to OS X, we've mostly been guessing what format to use.

wolco9y ago

I'm curious why New Zealand would have a semi-colon as the default. Is there a historical or cultural reason?

bitmedley9y ago

1 more reply

billforsternz9y ago

Nothing comes to mind. I think someone might have had a moment of unnecessary inspired creativity. The main point is, using this setting is a mistake for this context. Let CSV be CSV.

amptorn9y ago

> I think Excel was finished circa 2000 and now they focus mainly on

Not breaking established behaviour?

rplnt9y ago

Do you directly open the file? It used to ask during import.

danso9y ago· 5 in thread

The complex array usecase is where an opinionated-type of conversion tool is particularly needed, but I wonder why it behaves like this:

   name: 'Robert',
   lastname: 'Miller',
   family: null,
   location: [1231,3214,4214]

lastname,name,family.type,family.name,nickname,location Miller,Robert,,,,1231,3214,4214

Why not have `location_1, location_2, location_3`, instead of having a single location column? The latter implementation makes the data difficult to quickly use in a program (like a spreadsheet).

kauegimenesOP9y ago

This only happends if its a simple array, but if you have something like

   name: 'Robert',
   lastname: 'Miller',
   family: null,
   location: [{city: 'A'}, {city: 'B'}]

you would get 2 csv lines one with location.city A and other with B

c0nfused9y ago

As to the last, I got nothing. It's how I would have rolled.

robmiller9y ago

Hey, why are you picking on me? :)

kauegimenesOP9y ago

It's possible to implement the `location_1, location_2, location_3` as a optional feature.

But i am not sure what name to use for this option.

traviscj9y ago

Maybe some variant on "Extract array elements to separate columns"? or did you mean what to name the command line option? Maybe a "--list-treatment=" option, which can be "concat" or "separate"?

dc29y ago· 4 in thread

For a <5Kb library, I recommend removing the underscore dependency. You should make this as lightweight as possible if you want a lot of adoption.

stdbrouw9y ago

Converting to CSV is mainly a server-side problem, not something you'd want to do in the browser, so this doesn't really feel like an issue to me.

kauegimenesOP9y ago

Thanks for the tip, you are welcome to submit a PR if you want to =)

d2xdy29y ago

Not OP, but I just submitted a PR removing underscore as a dep. :)

1 more reply

JoshGlazebrook9y ago

Yeah this looks like it's only using helper functions like _.isFunction(), _.isNumber(), and _.extend().

TAForObvReasons9y ago· 3 in thread

The most popular NPM module for JSON to CSV conversion is json2csv: https://www.npmjs.com/package/json2csv https://github.com/zemirco/json2csv

cheapsteak9y ago

Here's a graphical comparison of downloads of popular json to csv conversion libraries -

http://npmcharts.com/compare/json2csv,jsonexport,csv,fast-cs...

/shameless plug

1 more reply

kauegimenesOP9y ago

json2csv require you to tell every header and don't work well with nested objects and require json2csv-stream for large json files.

drunkcatsdgaf9y ago

False, headers are optional. Nested objects are still a problem though :(

1 more reply

codezero9y ago· 3 in thread

Cool, I immediately wondered if you handled a lot of the tricky CSV things like quotes and internal commas, it looks like it does a good job.

Comparing against Numbers (probably a bad comparison?) I am seeing one slightly different result:

If I have a field that is "hehe"

It's encoded as ""hehe""

It looks like Numbers adds the enclosing quotes:

"""hehe"""

I only see enclosing quotes if there's a comma:

"hehe,hehe" -> """hehe,hehe"""

Anyhow, I'm not sure what the "correct" thing to do here is, if there is one, just a heads up!

chc9y ago

CSV is pretty poorly standardized, but if you're going by RFC-4180, Numbers is definitely right, due to the interaction of two rules:

* double quotes aren't allowed inside a field that isn't double-quoted

* double-quotes that do appear in a field have to be escaped by preceding them with another double quote.

So by RFC-4180, I'm pretty sure ""hehe"" shouldn't be possible, and the way to represent "hehe" with the quotes is """hehe""".

codezero9y ago

Cool, yep, thanks for pointing out the RFC. I don't envy anyone making CSV parsers and basically assume they won't work when I need to use them :)

kauegimenesOP9y ago

Here is all the logic related to this behavior: https://github.com/kauegimenes/jsonexport/blob/master/lib/co...

BrandiATMuhkuh9y ago· 2 in thread

trefn9y ago

(I work at Mixpanel)

kauegimenesOP9y ago

What you think about this syntax https://github.com/mmckegg/json-query?

tambourine_man9y ago· 2 in thread

Just yesterday I was looking for the other way around, CVS->JSON

TeMPOraL9y ago

- Wrap the entire file with { and }

- Wrap each line with [ and ]

- eval the file

;).

tambourine_man9y ago

It wasn't neatly quoted, which I assume is a must for JSON, but yeah, I ended up going on a different path.

kmike849y ago· 1 in thread

From a Python developer point of view, design is a bit surprising - why isn't it created as two separate libraries:

1) csv writer (and reader?) which takes care of all csv dialects crazyness; 2) a library which "flattens" nested objects/arrays?

(1) is not opinionated (besides a few API choices), it just has to be correct; it doesn't make much sense to re-implement (1) everywhere.

(2) can be more opinionated, it is easy to disagree with design choices, there is more room for personal preferences.

abritinthebay9y ago

It could be. The question is: why should it be?

Everything you said makes sense if those are the goals, but if they aren't... well, why should they be?

The design you suggest would be done by some module authors for sure, but there's no reason it needs to be that design.

freezer3339y ago· 1 in thread

Nice! If you want to do something similar with Excel xlsx this works: https://www.npmjs.com/package/jsonexcel Not sure how it handles huge arrays though...

kauegimenesOP9y ago

Thanks for sharing, you can also use jsonexport with rowDelimiter: '\t' for a xls compatible output.

madamelic9y ago· 1 in thread

This is a new one. Most people want to go the opposite way.

Either way, neat addition.

nolok9y ago

> Most people want to go the opposite way.

When you want to easily provide something excel/gsuite/... Will take in without using an heavier excel compatible library, CSV can be quite decent.

chid9y ago

There's a similar command in pandas called json_normalize (http://pandas.pydata.org/pandas-docs/stable/generated/pandas...)

j / k navigate · click thread line to collapse