1. physicists
2. Wall St. quants
3. game programmers
4. PhD statisticians
So, the problem is not that there aren't 6,600 people in the US that can do it, it's that the companies can't pay or don't want to pay the $200,000 + that would be required to hire them.
Instead of working as a data engineer, I'm working at a non-profit doing pretty much everything involving data for them, as well as running their appeals, and doing almost all of the analysis. I'll lead off by saying the biggest downside of working for this particular non-profit is the salary. However, there are a lot of things I like about this job:
1) Location: I want to be located in Chicago. I have 0 interest in moving out to the West Coast. I'm up in the air about working remotely, because I feel like there is a lot of value in working with people in person.
2) The role is very broad. I get to do a lot of exciting things with data, but it is also a marketing and communication role as well. I am included in nearly every strategic discussion, not just those pertaining to data or technology.
3) Work life balance is very good. I am never expected to work more than 40 hours a week. My boss makes sure that everyone is focused on their lives, to the point where he basically kicked me out of the office for a week because I was waffling about taking a vacation. He makes sure that people know they aren't expected to check their email or do work on the VPN during off-hours.
4) The work I do makes a difference. Not in a "I make something people use" difference, but in a "my work has rescued people from being homeless and fed starving kids" difference. My first couple of jobs out of college were totally lacking this aspect, and I didn't realize how much it meant to me until I started working at a place like this.
I've been here a few years now, and so it's approaching the time where I should start looking for a new job if I want to continue to grow, but I'm having trouble visualizing what that would be. From my perspective, the problem with hiring is that job listings really focus on titles rather than roles, even in smaller organizations. I think my best bet of finding an organization matches the first two points, if not all four, is through my network rather than through job postings. So, to your point, the only way I see myself in a narrow-title role like a "data engineer" is if I really need money.
Just wanted to comment on this part - sadly it's difficult to make more than a pittance even with a PhD.
If you want to talk about a shortage of labor where it would matter, biology as a field is probably hurting way more for talented software engineers than any company that needs a data engineer. There are so many great applications for programming in biology, and unlike other sciences, say physics, researchers don't tend to pick up on any amount of programming skill on their way to their PhD.
I've tried getting involved in bioinformatics on the side, but it's really difficult to keep up with the field if you don't have thousands of dollars to drop on journal subscriptions. It's also really hard to get access to the data researchers use in general (in any field), but it is made even harder when dealing with researchers involving people due to concerns about privacy. I don't think a focus on privacy is a bad thing, but a lot of publicly available data is sanitized to the point where your sample size would need to be in the billions to draw any inferences. You can request access to less general data, but good luck doing that without the support of a research organization.
Anyways, unless you have a martyr complex, there really isn't any reason to go into bioinformatics.
I'd just be happy with that. Most of the work I've done professionally hasn't gone anywhere; it's always "we missed the market window" or "upper management decided on a new strategy". I can't point to that many things I got to work on that actually made it into the market and were used by people for long. One place (a semiconductor company) had a successful though buggy product and large customers in place, with the product already deployed into the field, and the software I wrote got used by some customers, but then suddenly the company decided they weren't making a big enough profit margin on this part (even though the profits were guaranteed and extremely low-risk as the customers had the part designed-in), so they simply quit the market and laid off our entire team.
Making something people use would be a step up. Rescuing people and feeding starving kids is a pipe dream, but then again I work on embedded devices, not big data or analytics or anything like that so that's not exactly a position that'd be easy for me to find if I really wanted it.
And I would bet across all of the tech workers in the USA there are well more than 6k that could do this.
But the GP was particularly amusing to me because of its assertion that 'smart, quantitative people, regardless of industry, can build data infrastructures for startups.' I guess we could also say, there's little incentive to pay to train them (or for them to pay to train) to become a data engineer.
Source: Am physicist who'd love to find sustainable part-time work at market rates.