We’re in the Middle of a Data Engineering Talent Shortage (opens in new tab)

(blog.stitchdata.com)

143 pointshankmh9y ago159 comments

159 comments

126 comments · 26 top-level

jnordwick9y ago· 44 in thread

Whenever I see these posts I immediate translate them in my head to "we're in the middle of a talent shortage at a price I am willing to pay."

I've worked with very large amounts of data and high performance computing for most of my career; I mostly had finance related jobs in the last decade or so. I have most of the skill you want, including some you don't know you want. However when salary comes up, that is where we start to part ways. If you are really serious about a shortage, you should be really serious about making offers that can be competitive, but I keep seeing the same $150k offers. That isn't a "shortage" kind of offer.

dizzystar9y ago

Are they looking for someone who must have every box ticked or are they looking for someone with enough qualifications yet needing work so much they are willing to undercut themselves? Are they justifying their salary offer because you tick 90% of the boxes and not 100%?

I've been looking for work in data engineering and databases for 9 months, and while I'm certainly not as qualified and experienced as you are, I consider myself capable. I've definitely passed the take home and whiteboard tests I've been given, etc.

When I read about a "shortage," I wonder if this is more indicative of unicorn searching than anything else.

SmellTheGlove9y ago

That to me is a classic recruiting problem in technical positions, data engineering included. Unless you have a manager handling it themselves, the person doing the initial screen really is ticking boxes because they may not know any better.

Once a resume gets to me, and I'm only speaking for myself here, I'm looking for the challenges you've faced and the problems you've solved. I actually care very little about what tech you used because odds are we'll have something different, but we'll need to solve problems. If someone is solid in some related technical skillset, can think critically, and communicate the details of what they've tackled in the past, learning our specific tech stack is going to be the easy part.

Let me put it another way - when I look for interns or entry level hires, the number of those that can do more than spell SAS or Teradata approaches zero very quickly. But if they've solved challenges of the magnitude that they'd be expected to solve with us initially, the tech is secondary to process and problem solving. As we look more experienced, I'd still be limiting myself to candidates from a set of "legacy" industries that prefer these sorts of tools if I insisted on checking those boxes at the outset. I'd prefer to teach a really smart person to use the things that they don't know yet if I have it my way.

walshemj9y ago

Quite I am sure my experience doing MR back in the early 80's for British Telecom would be usefull today - but I suspect that I might struggle to get past the hr screen.

That was when 17 top of the line supermini's Pr!me 750's was a huge cluster (we where the largest non back user in the UK) - probably about the same as a 10-20k core Hadoop setup would be today.

blihp9y ago

Of course they must. By demonstrating that they cannot find candidates with 100% of the 'required' skills at the price they are willing to pay, the path is cleared to go the route of 'highly skilled' H1B applicants etc. with a small percentage of these skills. It is not, and has never been, about the skills.

SmellTheGlove9y ago

I think it's definitely true. Functionally, I'm a director of data engineering (with a big company, so my real title is way more generic). Usually in the initial screen, we'll talk general dollars, and my number is always out of range. For my level and the fact that I'm reasonably happy where I live now, the number is 200k + relocation (more for Bay Area, but lets not go there), and I don't think that's unreasonable for a director level who is presumably going to also develop your more junior DEs.

I don't fancy it up too much, either. I build teams that make the data move and land it clean so that your PhDs can do the smaaaht stuff with it. I can stack BI and Analytics on top, but a lot of people can do that starting from clean data - and clean data is what I do. But I do get the impression that we're viewed as janitors and plumbers - who you'd be thrilled to see at 3am when your shit(ter) broke, right?

sloxy9y ago

"director of data engineering"

This is already generic.

1 more reply

stale20029y ago

Although your statement is technically true, it is basically meaningless.

Yes, you can always always always find somebody to do a job is your a willing to pay 10 million dollars. That means that "shortages" are impossible. It means that you can never have a shortage in any situation, because you can always pay 10 million dollars for a single visit to the doctor.

But this line of logic isn't very useful when talking about "shortages".

If you had to pay a million dollars for a loaf of bread, is there a shortage of bread? IE, billions of people will starve to death by next week, because they can't afford to buy food.

Most people would say "Yes, there is a shortage of bread".

When people talk about shortages, they are obviously talking about a shortage at a certain price point. There is no other definition of the word shortage that makes sense.

A good definition that I use for the term shortage is "If the government could snap its fingers and instantly produce large amounts of X overnight, would the world be a better place"?

If the answer is "Yes, the world would be in a much much better place", then that means there is a shortage of X. If the answer is "No, the world would only be a little better". Then that means that there is NOT a shortage of X.

whybroke9y ago

There is no analogy to bread or 10miilon dollar salaries.

A company found something it could profit from more if it paid less than current market value. That is all. They are not saying there are no qualified applicants. They are not saying they want 10million dollars.

What is the case is that a business finds a resource (the perfect hire) that they wish to profit from but do not want to pay the market value for it because that would reduce profits. Rather than be satisfied with what would be an erosion of profit (or an admission of an unworkable business model) articles are posted to demand government pressure wages downward.

If you want a bread analogy, it's as if I found a cheap source of bread I can sell elsewhere at a profit but then complain there's a shortage solely because the cheap stuff isn't even cheaper.

jnordwick9y ago

If they were offering $300k and still couldn't find attract top talent from other industries, then we could have a discussion about a shortage, but the low six figures doesn't show a shortage situation. It shows a market with plenty of headroom for salaries to grow.

sbov9y ago

Haven't you just done a 180? I mean, I'm pretty sure the world would be in a much much better place if the government could snap its fingers and instantly produce large amounts of almost anything. Therefore there is a shortage of almost everything.

sksnxjis9y ago

Given the millions of unemployed Americans, it seems this is not true for at least some occupations.

Wal-Mart greeters can be wonderful people and I'm not saying they aren't valuable as humans. But in labor market terms, there is clearly not a shortage of them.

tuna-piano9y ago

Of course this is more or less always true - there are only shortages or excesses of things when prices don't or can't adjust freely.

If there was 1 gallon of water left on earth, Bill gates would buy that gallon for $50 billion, and everyone else would die of dehydration.

There has always been a shortage of maids willing to do all my house work for $10.

And there is a shortage of data engineers at $x, but there wouldn't be a shortage at $1M/year (because less companies would want one, and more people would be willing to do the work).

beachstartup9y ago

> If there was 1 gallon of water left on earth, Bill gates would buy that gallon for $50 billion, and everyone else would die of dehydration.

really? who would sell the last gallon of water on earth?

snovv_crash9y ago

Someone with a bunch of hydrogen, oxygen and a rudimentary knowledge of chemistry.

branchless9y ago

someone with liquid assets. I'll get my coat...

spoonie9y ago

Someone who just drank the last 2nd last gallon. ;)

1 more reply

wickedlogic9y ago

If they needed water to prime the last pump on earth?

p4wnc69y ago

Disney would.

biztos9y ago

Maybe you should start a Data Science & Engineering consultancy. The same people who would offer $150K to an employee often have bosses who would love to spend $500K for a person-year of (contract) work if it comes with a high probability of success.

SmellTheGlove9y ago

I have thought about that. Many times. I have a couple of barriers, most of which are temporary:

1. I have student debt from my law degree, and I have a lower risk tolerance until that's paid off. 2. My daughter is 4, it's nice to be around for the early years, and the corporate gig is quite comfortable in terms of hours. 3. I'm in Maine. Most clients would require me to travel, which impacts #2.

I do have a former colleague here that started a data consultancy. I should grab a beer with him and see if we have common ground in the short term. It's not quite starting your own thing, but it might be fun.

2 more replies

snowwrestler9y ago

This argument comes up all the time on HN, but I don't think it means anything. It seems to me that the ability to fill an opening by offering more salary can't disprove a talent shortage, because it is always possible to do so.

Thought experiment: If 100 companies had openings for a skill set that only one person could deliver, all 100 companies could eventually fill their openings by sequentially outbidding each other for the services of that one person.

So how would we know if a talent shortage really exists for a certain job? I can think of a couple potential hints: if starting salaries are going up much faster than the national average, or if the unemployment rate for that job is much lower than the national unemployment rate. Either would seem to indicate that, relative to the job market as a whole, there was a greater demand than supply for that particular job.

trendia9y ago

In this case, though, there are way more than 6,600 people in the US that would be able to get do that data engineering job, including:

1. physicists

2. Wall St. quants

3. game programmers

4. PhD statisticians

So, the problem is not that there aren't 6,600 people in the US that can do it, it's that the companies can't pay or don't want to pay the $200,000 + that would be required to hire them.

Declanomous9y ago

This comment will sound a bit self-serving, but it supports your point. I have most of the skills necessary to be a data engineer. My degree in biology, but I nearly got a double major in computer science with a minor in math. (I wanted to work in bioinformatics, but it's nearly impossible to make more than a pittance without a PhD) I didn't pursue the double major because I felt taking classes outside of those three fields was more useful to my development.

Instead of working as a data engineer, I'm working at a non-profit doing pretty much everything involving data for them, as well as running their appeals, and doing almost all of the analysis. I'll lead off by saying the biggest downside of working for this particular non-profit is the salary. However, there are a lot of things I like about this job:

1) Location: I want to be located in Chicago. I have 0 interest in moving out to the West Coast. I'm up in the air about working remotely, because I feel like there is a lot of value in working with people in person.

2) The role is very broad. I get to do a lot of exciting things with data, but it is also a marketing and communication role as well. I am included in nearly every strategic discussion, not just those pertaining to data or technology.

3) Work life balance is very good. I am never expected to work more than 40 hours a week. My boss makes sure that everyone is focused on their lives, to the point where he basically kicked me out of the office for a week because I was waffling about taking a vacation. He makes sure that people know they aren't expected to check their email or do work on the VPN during off-hours.

4) The work I do makes a difference. Not in a "I make something people use" difference, but in a "my work has rescued people from being homeless and fed starving kids" difference. My first couple of jobs out of college were totally lacking this aspect, and I didn't realize how much it meant to me until I started working at a place like this.

I've been here a few years now, and so it's approaching the time where I should start looking for a new job if I want to continue to grow, but I'm having trouble visualizing what that would be. From my perspective, the problem with hiring is that job listings really focus on titles rather than roles, even in smaller organizations. I think my best bet of finding an organization matches the first two points, if not all four, is through my network rather than through job postings. So, to your point, the only way I see myself in a narrow-title role like a "data engineer" is if I really need money.

2 more replies

huac9y ago

I'm going to guess most of those people couldn't set up and scale a Hadoop cluster. Are they smart enough that they could learn this stuff? Sure! But there's still a skill mismatch here.

1 more reply

ISL9y ago

Or, offer the comparably-compensated part-time job that an academic physicist would accept in parallel with continuing to work in academia.

Source: Am physicist who'd love to find sustainable part-time work at market rates.

1 more reply

blahi9y ago

phd statistician can write ETLs and data infrastructure?

1 more reply

snovv_crash9y ago

Bullshit. I have the skills for an intermediate-level data engineer, but I find it bland and I'd rather work in computer vision. However, offer me enough and I may reconsider, and I don't think I'm alone in this.

Declanomous9y ago

I basically wrote the same thing as a reply to your sibling comment. Data engineering would have to pay a lot more than I currently make for it to be an option, and even then I'd probably change fields once I paid off my student loans and saved some money.

snowwrestler9y ago

You're not alone, but you're just reiterating that it's always possible to fill an opening by running the salary up high enough.

1 more reply

JamesBarney9y ago

Shortage has a specific term in economics, which pretty much only happens because of price controls.

However if people start liking kale, and the price goes up 20% and you start telling people about the massive kale shortage people will think you're being a little histrionic.

achompas9y ago

Yeah, this is a selling problem. It feels like you're far more likely to gain traction starting a data team than taking an IC-track DE role. It's easier for companies to justify $200k+ for your skillset in that case, even if it takes you away from pure engineering.

Alternatively, you can just join a large tech org. Netflix etc. have no problem paying good DEs north of $200k in total comp.

tobyjsullivan9y ago

"we're in the middle of a talent shortage [and don't believe in upskilling]."

ThomPete9y ago

Upskilling is one of the most ineffective costly ways to try and "re-program" workers and it mostly doesn't work because it's not about skills it's about talent.

linkregister9y ago

Talent that occurs through the genetic/epigenetic process of having attained a Data Science masters degree after earning a Computer Science degree?

I am a believer in inherent talent but Data Engineering is a skill set.

1 more reply

crispyambulance9y ago

I sort of agree in that orgs can't simply create massive education programs to re-purpose skill-sets/talent. That might have been possible "back in the day" before project managers were breathing down people's necks, but not today.

But the brightside is that talented people will find a way to "upskill" themselves in whatever environment they find themselves in. It is then up to the candidates to sell themselves and for the potential employers to be flexible about considering different backgrounds and nurturing the development of cross-functional skills that are needed for so-called data-engineers.

The skills listed in the article are all fairly common but its hard to find enough of these skills within individuals. For example, its not hard to find folks who can do the care and feeding of sql-server databases, or skilled programmers, or analysts who understand the business domain intimately. The problem is getting all of these together in one individual in a "know-enough-to-be-dangerous" level.

1 more reply

etjossem9y ago

That's not always the case. Talent doesn't exist in a vacuum.

Someone with a natural talent for picking up new development skills will still learn data engineering far faster when provided with proper resources and strong internal mentorship.

I can see how you might make this observation after observing a poorly conducted training program.

1 more reply

duaneb9y ago

How would you distinguish talent from experience?

1 more reply

sdoowpilihp9y ago

This has been my experience with any "senior" engineering / BI / DS role. There is a particularly high level of price sensitivity to anything above 200k.

x0x09y ago

In particular, employers whining about lack of X need to ponder raising wages to where employees can afford homes in a city where prices are now within spitting distance of $1k/ft2. When your basic pitch is, "We desperately need [data engineers | machine learning engineers | computer vision engineers | what have you] so desperate to live in CA they'll accept never being able to afford a home unless our lottery tickets pay out", it should be unsurprising they have a hard time finding the talent they claim to need. Or, they could accept remote workers! Even remote workers near sfbay, who just don't want to burn 2.5 hours/day commuting in and out of sf...

harichinnan9y ago

We all like $400K the investment bankers make. But Finance Industry had developed a business where they could pay their workers $400K and still make a huge profit for their investors. Except for Googles and Facebooks, the average tech startup is not making Finance industry level profits.

Also Finance requires proper education and training. Not so much for App development. So for everyone who complains about getting $150K offers, there are a 100 thousand people right here in US applying for $60K technical analyst jobs.

__derek__9y ago

> Except for Googles and Facebooks, the average tech startup is not making Finance industry level profits.

And they don't have finance/Google/Facebook level needs for data engineers. They can't reasonably claim to need top-level skills and then beggar out on the cost.

bobosha9y ago

>Whenever I see these posts I immediate translate them in my head to "we're in the middle of a talent shortage at a price I am willing to pay."

That's true for just about anything.

"there is no epipen crisis, only a crisis at what you are willing to pay"

"There is no poverty , only poverty at a given income level"

*"there is no crime problem, only crime problem at a given crime level"

what you are saying is self-contradictory. If you (or others) are able to turn down 150K offers...you know what you are.

wavefunction9y ago

You must admit that the price of epipens is an artificially inflated one only possible due to government imposed monopoly, not one driving by true market forces.

Poverty is simply a description wealth and is always comparative. We can define poverty as any level we so desire.

One might argue that any crime is a problem, as long as it causes an issue for society or victims.

whenwillitstop9y ago

My experiences exactly, pinged by companies obsessively for my big data skills, all trying to pay me less than I am currently making.

mattnewton9y ago· 20 in thread

I'm trying to switch careers into "Data Engineering" now, as a full stack developer who is more interested in ML, and I've found almost no traction internally at my company or externally. It looks like I may just accept a full stack position at a good company that does a lot of data science for now, but though I would ask - Where are all these jobs?

PaulHoule9y ago

"Data Engineering" is most of the work that needs to be done, but I think companies haven't identified it as a category.

From my P.O.V., "Full Stack Engineer" is a place you don't want to be because it means putting out fires with whatever junk javascript is in the front end. It seems like everybody who's built a serious javascript application has invented their own Virtual DOM because none of the popular Virtual DOM libraries are good for much other than wasting time and CPU cycles.

"Data Scientist" is a bad title in it's own way, in the sense that "Computer Science" is bad, but worse. To a lot of people there is a Brahmin kind of attitude associated with "Scientist" -- i.e. an aversion to getting your hands dirty. Real world data is pretty dirty and you aren't going to get far in getting value out of it unless you spend 80-90% of your time dealing with the dirt.

pjmlp9y ago

There are "Full Stack Engineer" doing pure native applications, which is what I have been doing the last three years after escaping the web back into native land.

mattnewton9y ago

You are correct. I thought full stack meant before building the app start to finish, but the reality is often closer to putting out other people's fires in every layer. It does pay well though and you learn a lot of what can go wrong.

PaulHoule9y ago

The fact that it pays well makes it a job you're likely to get laid off from. Most managers would rather hire two junior developers so they can screw it up faster or better yet hire some people in another country who are really fast and cheap at screwing it up.

1 more reply

achompas9y ago

My official title is "Data Scientist" although I'm closer to the "ML Engineer" someone else mentions in a child comment.

Frankly speaking, if your company doesn't need a data engineer, it won't hire one or move you into that role. They likely don't, either, if you're experiencing this pushback -- data engineers often develop ETL pipelines or data warehouses, both of which are very useful if your company has a data team and very useless if it does not.

That said, you may want to move closer to my role. There's actually a shortage of data-savvy people who can also write production software, and you would nicely complement a more research-inclined data scientist or analyst -- someone with far more experience with research/analysis than development.

p4wnc69y ago

> There's actually a shortage of data-savvy people who can also write production software, and you would nicely complement a more research-inclined data scientist or analyst -- someone with far more experience with research/analysis than development.

I experience the same problem with shortage-at-price-X in the field you describe. I'm a machine learning engineer with experience in MCMC methods, but I also have a lot of low-level Python and Cython experience, some intermediate experience with database internals, and lots of experience writing well-crafted code for production systems.

There are basically zero companies willing to pay what I'm seeking (which is a salary based on my previous job and a few offers I got around the time I took that job). In fact, in some of the more expensive cities, the real wage offered is far lower than other markets.

I've seen reputable, multi-billion dollar companies offering in the $140k range for this type of role in New York. That's wildly below anything reasonable for this sort of thing in New York. I've seen companies in Minneapolis offering $130k for the same kind of job -- and even that is still too low for Minneapolis! The same has been true in San Francisco as well.

Because these companies value you more for simply looking good on paper and looking good as a piece of office ornamentation when investors stroll through, and they view you as an arbitrary work receptacle closer to a software janitor than a statistical specialist, their whole mindset is about how to drive wage down.

Frankly, given the stresses of the job and the risk of burnout, I think it's actually a terrible time to be in the machine learning / computational stats employment field, despite all of the interesting new work and advances being made. The intellectual side is good, but the quality of jobs is through the floor.

geebee9y ago

"I've seen reputable, multi-billion dollar companies offering in the $140k range for this type of role in New York. That's wildly below anything reasonable for this sort of thing [in NY/SF"]

Man, do I ever agree. This is where the "shortage" argument falls apart.

This is why I'm so uninterested in the abstract arguments happening elsewhere on this topic about whether markets are failing and basic laws of supply and demand no longer apply at theoretical salary levels (10 million was offered as an example).

Why are we bothering with this debate, when it's so far from reality? I'd say that if you're trying to hire a very high skilled and critical tech worker in SF, and you just can't find one no matter how hard you try, and then I find out that you're only offering 140k a year?

In San Francisco and New York (and anywhere else in the US, really), that's nowhere close to the kind of pay where we should start scratching our heads about a shortage and start wondering why the usual laws of supply and demand aren't working anymore.

achompas9y ago

Yeah, I strongly believe companies haven't (or aren't willing to) figure(d) out the IC track problem for data people in the way they've figured it out for engineers. Part of me wonders if it even makes sense for them to figure it out, if they're not an Uber/Netflix/Amazon with a strong need for advanced ML abilities.

It sounds like you're a principal/lead/post-senior ML engineer; at that level, you can easily command more than $140k but you have fewer options to apply those skills at companies that really need them (because few companies actually need them).

I don't know. It's tough. I agree that it might be a terrible time to work in ML/computational stats because of stuff like this.

infinite8s9y ago

I suspect the reason is those companies offering $140k frankly don't need that level of expertise. With that kind of background it would be fairly easy to get 200-300k as an infrastructure engineer at a quant shop.

achompas9y ago

Oh, also: if you're in NYC I'd be happy to meet over a coffee/beer to swap stories. Feel free to use the contact info in my profile.

mattnewton9y ago

I think the company does need data engineers but wants someone with a graduate degree from Stanford or CMU in that position, even though the actual work is in building up infrastructure for those people. And I understand. I've only really got software engineering skills to contribute at this point and I'm picking up the ML from kaggles on the side; I am looking for a position that can increase my overlap between those, because learning at home while working on unrelated stuff is making me move slowly and painfully. Your experience sounds exactly like what I'm looking for - data-savvy writing production code, complementing a research-heavy team I can learn from. How did you get started in that?

achompas9y ago

I honestly fell into it by luck. I moved to NYC, studied machine learning in grad school, networked my ass off, and landed an internship.

From there I went full time as something of an ML engineer at a company with a strong tech culture, and learned as much as I could in both tech and ML/statistics. The rest is history (although I'm by no means a rockstar or whatever).

My path is hard to reproduce -- it starts with being in NYC or SF at a specific point in time, before the labor market became saturated with data science bootcamps and PhDs furiously learning Python while working on their dissertations.

Your best bet at this point is to produce a few data-related projects (maybe work on open source like scikit-learn and pandas?) and network like crazy. Someone somewhere will have a need for someone like you.

1 more reply

FLUX-YOU9y ago

>There's actually a shortage of data-savvy people who can also write production software

Well no kidding, that's one person doing two jobs. That's easily a 5-10 year training time depending on how high a quality you demand from their production software.

willis779y ago

We (Kaggle) run a data science jobs board (https://www.kaggle.com/jobs) that gets a few data engineer listings from time to time. Not all of these are active, but you may find a few interested companies via - https://www.google.com/#q=site:https://www.kaggle.com/jobs+%...

mattnewton9y ago

Thank you guys! Doing Kaggle competitions is what got me interested in seriously pursuing ML in the first place. You are all seriously awesome.

I'll look again at the board but, I didn't see anything there before that wanted software engineering skills (which I have with industry experience), and not a graduate degree (which I don't), and happened to be commutable from my place just south of the bay. But I will keep looking!

ironchef9y ago

I see tons of them. If you're interested in ML, you're probably more looking towards data science. Data engineering (in general) is more about getting the data in a state where it can be used (extracted, cleaned, moved, transformed, etc.) at least from what i've commonly seen in the industry. A decent breakdown is here: https://blog.insightdatascience.com/data-science-vs-data-eng...

alexbeloi9y ago

You might want to look at "Machine Learning Engineer" positions if you want to do ML in practice, it's starting to be a title I see somewhat often now.

As others have pointed out Data Engineering is more about building data pipelines, making architecture decisions for your ML stack, things like that. Less about model building, prototyping and training, which is what I think of when somebody says they 'do' ML.

mattnewton9y ago

Right, I'm not picky about the title. I'm looking at those positions too. The main thing is, I want to be able to contribute using my existing software engineering skills from day 1, while picking up the ML stuff. It's been really hard to basically work an unrelated job during the day and go home and do kaggles for practice, so I am hoping to get more of an intersection as a launching place. Anything touching the data or the models will do :)

minimaxir9y ago

ML falls more under a Data Science role than Data Engineering, although ML is much more difficult without proper Data Engineering.

bcbrown9y ago

You should put your email in your profile. If you're in Seattle, send me an email.

dmatthewson9y ago· 13 in thread

From the article: "Data engineers are the janitors who keep your data clean and flowing."

Hm, I wonder why he's having problems hiring janitors.

pavlov9y ago

Bizarrely, I remember a recent HN discussion where a poster was arguing that any software developer who is not working in machine learning is like a plumber.

I guess this means that the entire profession consists of janitors and plumbers.

jballanc9y ago

Considering that plumbers and janitors have likely, in the entire history of human civilization, done more for health and longevity than doctors and scientists...I'm kind of ok with this analogy.

hga9y ago

Doctors, maybe, but it was the scientists who told them about the germ theory of disease, for instance.

I've read, but not confirmed for myself, that in the US the biggest gains in health came in the post-Civil War period, when "plumbers and janitors" made the difference. Of course, that's really starting with, after the science, the civil engineers who designed the public works systems that supplied clean water and took away sewage, and let's not forget that politicians and like who found it worthwhile to buy votes that way (now, they take our infrastructure for granted and buy votes more directly...).

2 more replies

p4wnc69y ago

In the minds of middle management, I think this is precisely correct.

triplesec9y ago

and as pointed out so much, is entirely why nobody wants to work for them. Respect these very bright people znd you have a starting negotiation position.

biztos9y ago

I recently had a plumber do some work on a >100-year-old apartment. I was lucky: he's a very good plumber.

The job didn't involve too many "pipelines" but the knowledge and creativity required to make them work was well above what I see from most software developers.

"Plumber" is not the put-down that poster thought it was.

jrs2359y ago

Janitors? They are certainly more than janitors! More like plumbers... getting your data safely from point a to point b without plugging things up while passing through [process] boundary's. How much does a plumber cost? $140 / hr? Sounds about right.

kafkaesq9y ago

Data engineers are the janitors who keep your data clean and flowing.

In a boldface font, no less. The cockiness behind that language is really quite astounding.

praccu9y ago

It's really true, though. It's brutal, ugly work with no hope of an end.

Edit: Favorite paper on the topic: http://research.google.com/pubs/pub43146.html

kafkaesq9y ago

So is the work that doctors, lawyers, and other highly-skilled people do, by and large. Everyone knows that day-to-day aspects of these jobs are hardly glamorous (or even cerebral), the vast majority of the time. Yet somehow we manage to accord these people with their due degree of respect, and wouldn't think of referring to them as "janitors".

1 more reply

jakestein9y ago

Hey I'm the author of this blog post and the CEO of the company that did the benchmark report. That was a very poor choice of words on my part, and I appreciate you flagging it. I reworked the paragraph to remove the janitor comment and (hopefully) make it clearer.

wavefunction9y ago

You should also not use "janitor" as a disparaging term. That would be another good takeaway from all of this.

jakestein9y ago

I agree that it's a bad idea to use "janitor" as a disparaging term, and that was very far away from my intention. If that was what you took away from reading it, then that's more evidence that I didn't do a great job with writing the original draft.

Here's the original paragraph for reference:

Data engineers are the janitors who keep your data clean and flowing. Insights are great, and you need them. But to deliver insights at scale, you need data infrastructure. That’s delivered by data engineering. It’s not as fun to talk about as D3 visualizations and business intelligence dashboards, but it’s every bit as important.

mrharrison9y ago· 10 in thread

We should rename this job position to Data Sanity Engineers.

I have been thrown these projects at work before, where I'm the frontend engineer and I need to make some cool D3 visualization, but low behold the data is shit, and I have to help the backend team make the data useable. It's a mind-numbing job, that nobody wants, because it sounds like a one month task to get a good REST API up and working, but it usually takes three months, because you have to go back and forth making sure the data is right, and there is always 10 tricky edge cases that you have to work some magic on. Not only that but you need to have smart people cleaning the data, so that you don't make some big mistake down the line or your REST API is super slow, and you have to add another couple weeks or month to rework the data again. So that one month becomes three months, and most likely a year, because somebody will say that looks great but can we also add this, and it goes on and on. It's literally a mind-numbing job that most nobody wants. I have found that products like Tableau are the best for this, you still have to clean the data, but it helps speed up the process.

Data cleaning is a super golden problem to solve.

dizzystar9y ago

As a contradiction to this point, some people (me) really enjoy working with data, from cleaning, munging, creating, sorting, pipelining, etc, and find front-end visualization production excessively boring and mind-numbing.

Give me emacs and a command line, and I have all the truth I need, which is far more honest, in my mind, than anything that can be created with D3 or Tableau. Beauty is in the eye of the beholder, and it doesn't really do anyone service to look down on the work others find enjoyable. If doing D3 makes you happy, that is awesome, and I can only congratulate you for your passion and your ability to look forward to work I don't "get," and I wish the feelings would be mutual.

mrharrison9y ago

So I guess you are a data engineer? What makes it fun for you? How do work with your customers to give them what they need in a timely matter? I would be interested to know what stack you use to go from dirty data to customer consumption.

dizzystar9y ago

Closer to an aspiring data engineer, though I've done my fair share of ETL, cleaning, database building / rebuilding, admin. Prior jobs have been database engineer, probably closer to DBA.

I just enjoy working with raw data and raw code more than I enjoy writing something that launches a graphic. I enjoy writing a script that finds a bad piece of data, or a script that fixes up everything, or writing something that was once unable to run at all get converted to something that runs in 500ms. Perhaps it is that journey of constant discovery, and seeing that every situation is a unique little puzzle. It is seeing the world as it is with no one reinterpreting what the data means for me. I can explore it and discover what it really means. It is hollow truth, a mess of ideas converted to sets of ideas layered on sets of ideas, and when it is finally drawn down, converted, and passing all tests, it is self-evident and self-reflecting, and true. Hard to explain, but I suppose I like all the things people hate about it.

The tools matter about as much as it matters what CSS framework you are using. You have the ability to logic through UI and UX, whereas I do not. I have zero hope of ever doing well at what you do, since I simply don't have the foundation, but if it matters, I know most jobs I've applied to and worked at tend to be more ad hoc, using PL, Python, Ruby, etc.

2 more replies

banku_brougham9y ago

    sed -i 's/emacs/sublime-text/g' what_u_said.txt

ajamesm9y ago

more like Ctrl-H, tab, 'emacs', tab, 'sublime-text', tab, enter, esc, Ctrl-S

1 more reply

kafkaesq9y ago

Not only that but you need to have smart people cleaning the data,

Which are difficult to find when you think of them as "janitors", and treat them accordingly.

msie9y ago

Data Sanitation Engineers

SmellTheGlove9y ago

I do it for a living. It seems underappreciated in the industry.

mrharrison9y ago

I agree. I enjoyed doing it the first couple times, but people would often complain why I wasn't done sooner and didn't appreciate the level of complexity that went in to doing it. Once the appreciation was gone, I believe that's when it turned into a mind-numbing task for me. I don't mean to be little the job, I think I have just become sour to it because of the lack of appreciation.

SmellTheGlove9y ago

A big part of my role is getting out there in front of business partners to keep the things that we do well front of mind. If you manage this work in the traditional sense, you'll be invisible when things go well and shat-upon as soon as anything goes wrong. At my current organization, I've really had to work at this. Here's a story:

Once upon a time I managed (and, frankly, also wrote a lot of the code for) a project integrating half a dozen sources each managing a block of our business (billing, coverage, claims). The data was awful coming in and we managed to get a bunch of business processes changed in addition to some pretty heavy cleansing steps that we wrote. In any case, this big fragmented mess of monthly and weekly stacked data became my integrated, clean warehouse. For the first time ever at this organization, I had coverage and claims records tying up at a rate of 100% without any manual intervention. We did this so that we could implement a modern finance ops process on top (being intentionally vague) that would allow us to manage this block more efficiently, save time, and even let us better invest - it was a 2 year project including my data work. A handful of actuaries and analysts got promoted out of this as it was a BFD to the company. Yet, at the end of the year, when I got my review I got our equivalent of the average rating, 3 of 5, etc, and like a 3% raise, and a shitty budget for my people too. From then on, I spent almost as much time out there promoting our team's work as we did doing the work. We did considerably better the next year, and that's been the way I've operated ever since. I market the work.

This kind of work requires a manager who will actively market it within the organization.

ef5a0b06289y ago· 2 in thread

Every time something comes up on HN about a talent shortage in a field related to software engineering, it hurts. I have been unsuccessfully looking for a full time position since my last start up (I was not a founder) folded six months ago. I have been on over 25 in person interviews and gone through untold degrading whiteboard interviews, code tests, trick questions, and take home projects; all have ended in rejection. This industry has a need to torture candidates because we are all considered to be liars by default. Much is said about combating impostor syndrome in ourselves but we are too eager to engender it in others.

It seems people in this industry refuse to understand that some people are not perfect. I never graduated college because I hated it with the very fiber of my being, so I am not particularly great at white boarding answers to algorithm questions off the top of my head in a high pressure environment. If I need them during my job, I look up answers and learn from people who are much smarter than I am.

My personal identity has been shattered, as I thought my ~5-10 year history of success in the industry indicated I was in demand and talented. I saw posts like this and thought that if the worst happened I'd still be able to find a job. The idea that there is a talent shortage is a lie, or candidates like me wouldn't be treated as I have been. I'm not asking for a free job, or a handout. I have had a successful career so far and am capable of doing good work. But I'm not a specialist in Big Data Machine Learning Neural Networks.

I have struggled with bipolar disorder and suicidal ideation most of my life. I've dealt with the death of my beloved grandmother and my father who was instrumental in my choosing to be an engineer with only minor lapses in control. Nothing has caused me to consider taking my own life as much as the past 6 months. It seems there is no future for me in the only career I have any skill in and which is a huge part of my identity. And to constantly be told that there is such a shortage of engineers only salts the wound.

googletazer9y ago

" I have been on over 25 in person interviews and gone through untold degrading whiteboard interviews, code tests, trick questions, and take home projects; all have ended in rejection."

The fact that you pulled through 25 of them is already commendable. Unfortunately as a labor provider you'll be subjected to all kinds of crap for the privilege of working.

Every single person on here needs to have a secondary business going on right now. Doesn't have to be a highly skilled industry either, selling hand made stuff on Etsy can be a lifeline in these situations.

ultramagas9y ago

Hey, I'm going through something similar. I had to quit an amazing job because my wife and I pursued a dream and moved to Europe (no remote).

I had always had an easy time getting a job before but this time it was different. Granted I knew it'd be tougher since for remote jobs, the world is the competition. But it was a summer of endless shitty timed hackerrank-style tests (virtual whiteboard hazing). I would tell my co-workers about them and they'd laugh in bewilderment at the questions that were asked in what should be a technical screener, and these are extremely smart and productive software guys that have started companies, written books, give conference talks. One funny question I got for a frontend React job: write a function that takes a sequence of bits that represent a negative-binary number (not a base-2 number that is negative, but a base-(-2) number) and return its negated value in base-2. For a frontend job. It was one of 4 questions to be answered in 90 minutes. gtfo.

A few companies would reply, most strung me along while -- I realize now -- they were keeping me as a backup(-backup) incase their "A-player" turned them down. Countless interviews, hours on takehome projects, it was tough. I learned to cut bait if the company was slow to move forward, had weeklong periods of no communication, etc.

I (just very recently) found it's easier to land small contract gigs because the barrier to entry seems to be lower, demonstrate value, and keep getting work from those guys after the initial project was done. It is different but so far I actually like the freedom that comes with contracting. I haven't been at it long enough to experience the downsides.

There's definitely not a shortage of talent. It's that every company thinks they need "A-players", when the vast, vast majority are doing a damn basic CRUD app.

Just wanted to say I hear you brother and share my story in some solidarity. You will find something, just keep plugging away. Each "failed" attempt makes you better no matter how many attempts it takes. Cliche of course but it is true. I am very lucky in that I don't face the mental demons you do, even then this job search hit me pretty hard. Please be proactive and take care of yourself, body and mind (body goes a long way toward mind also).

protomyth9y ago· 2 in thread

I worked for about 10 years doing exactly what they want, but I ended up having to write a lot of the tools which means I'm not able to check the boxes on some tool you require which gets me punted by HR.

I'm starting to think that the message is if HR is going to do checklists then developers should really make sure they work mostly with contracts that use popular checklist items.

mulmen9y ago

As a data person I would really like to put some numbers on how much the typical HR hiring process costs a business. I don't know anybody that says they are happy with how hiring works in he tech industry but I've also never seen an HR person try and improve the process.

pyb9y ago

That's because the system is already optimised for the needs of HR people.

makmanalp9y ago· 2 in thread

Quick sidenote, anyone know where the databases / distributed systems engineering jobs are at? E.g. if one wanted to not use these tools but also go help build these tools?

I can think of Facebook, Google, Microsoft, IBM (which locations and groups within these companies / where?). I can also think of Confluent, CitusDB, Databricks, etc.

rhizome9y ago

Market Research is a $40B industry that depends almost completely on these concepts. I'm not sure how prevalent distributed systems are with MR companies, but that's an implementation detail anyway.

serge2k9y ago

> that's an implementation detail anyway.

Which is what the poster was asking for.

moandcompany9y ago· 2 in thread

I am a data engineer working on a machine learning team with models actively used as part of our product(s).

From my experiences working in various contexts (applied machine learning, analytics, policy research, academics, etc...), there are several of factors that contribute to this shortage: (1) "data engineering" often requires a lot of breadth and knowledge, (2) "data engineering" is often (derisively and naively) referred to as the "janitorial work" of data science, (3) the spectrum of roles and requirements within the "data engineering" domain, in terms of job descriptions, can range from database systems administration, to ETL, to data warehousing, curation of data services / APIs, business intelligence, to the design/deployment/operation of pipelines and distributed data processing and storage systems (these aren't mutually exclusive, but often job descriptions fall into one of these stovepipes).

Some of my quick thoughts and anecdata:

Companies have made large investments in creating 'data science' teams, and many of those companies have trouble realizing value from those investments.

A part of this stems from investments and teams with no tangible vision of how that team will generate value. And there are several other contributing factors…

"Dirty work." People haven't learned how to, and more often don't want to do it. There's a vast number of tutorials and boot camps out there that teach newcomers how to "learn data science" with clean datasets -- this is ideal for learning those basics, but the real world usually does not have clean or ideal datasets -- the dataset may not even exist -- and there are a number of non-ideal constraints.

There are people that wish to call themselves “data scientists” that “don’t want to write code” and would “prefer to do the analysis and storytelling”

Engineering as the application of science with real world constraints: there are a number of factors that we take into account, often acquired through painful experience, that aren’t part of these tutorials, bootcamps, or academic environments.

Many “data scientists” I’ve met have a hard time adapting to and working with these constraints (e.g. we believe that the application of data science would solve/address __ problem, but: how do we know and show that it works and is useful? what are the dependencies, and costs of developing and applying that solution? is it a one-time solution, or is it going to be a recurring application? does the solution require people? who will use it? what are the assumptions or expectations of those operators and users? is it suitable? is it maintainable? is it sustainable? how long will it take? what are the risks involved and how do we manage them? is it re-usable, and can we amortize its costs over time? is it worth doing? This is part of a methodology that comes from experience, versus what is taught in data science)

Larger teams with more people/financial/political resources can specialize and take advantage of these divisions of labor, which helps recognize the process aspects of applying data science and address some of the above

Short story: if you view data engineering as "janitorial work" you're missing the big picture

Anyone else notice that the attributes of a 'unicorn' data scientist include the traits of a 'data engineer?'

vijayr9y ago

How does one get started with this? I suppose a lot of people who hang out at HN are competent devs good in programming and databases, but probably beginners in math, ML, AI etc. How does such a person get started and find a job in this field?

beachstartup9y ago

in my mind the problem is really simple: most executives aren't smart enough to understand how any of this shit works, or build a compelling business case around it. they just know they need a 'big data' team, so it just dies on the vine.

someone with enough smarts to build/lead a team, sell to executive management, and have an actual business application is just too rare compared to the prevalence of the engineering talent.

skynetv29y ago· 1 in thread

anything and everything is marketed as "data science" and "data engineering" these days becasue this is the buzzword of the day.

I've been dealing with large data even before "big data" was a word but i dont call myself "data scientist" or "data engineer". I am still a software engineer working on what benefits my organization.

"Serial Entrepreneur" is the same these days, claimed by anyone who had a lemonade stand as a kid.

Swizec9y ago

> I am still a software engineer working on what benefits my organization

But if you saw a nearby local maximum that's higher than your current local maximum, wouldn't you change what you call yourself, if it means being paid more but doing the same work?

This is similar to how the average "software engineer" makes about $30k/year more than the average "programmer".

GeneralMayhem9y ago· 1 in thread

There are 6600 jobs listed and 6500 individuals on LinkedIn with that particular title, and therefore there's a shortage? Seriously?

* How many aren't on LinkedIn?

* Since the whole article is about how the job title is poorly defined and growing in prevalence, why would you assume that people who don't already have such a job would use the term?

* The "growth" charts on the full study are just as bad - how much of that is just from renaming existing generic developer positions, since "data engineer" is clearly a relatively new term?

sportanova9y ago

6500 data engineers on all of Linkedin, but 6600 job openings in the bay area. so there are more job openings in one area than all data engineers on linkedin

binalpatel9y ago· 1 in thread

The fact that the original, unmodified article referred to data engineers as "janitors" pretty much says it all.

It's very analogous to front-office and back-office work in Investment Banking. "Data Scientist" are the front-office, with all the prestige, and "Data Engineers" are the back-office, doing a lot of the heavy lifting without nearly as much recognition.

In my opinion there shouldn't be a delineation. You shouldn't be a data scientist if you can't gather, process, and clean up your own data.

biztos9y ago

Ideally you'd have a symbiosis, and each side would recognize the importance of the other.

Even if you require your data scientists to be able to do engineering work, it's probably way more efficient to have some good generalist Software Engineers doing all the "pre-math" work and freeing your statisticians up for what they're (hopefully) good at.

Plus as a side effect, your software will probably be better.

otto_ortega9y ago· 1 in thread

Am I the only one who thinks there will be a ton of people changing their job title on LinkedIn to "Data Engineer" as a result of this article?

collyw9y ago

I am thinking about it. Actually a friend recommended that I change my title to Data Engineer a few months back.

collyw9y ago· 1 in thread

So I know SQL, Python, Django, Java (though its been a while), Javascrit, Linux, some cloud computing and a bit of devops. Am I a data engineer? Software engineer, with a lot of database background? What makes a data engineer different from a software engineer?

njd9y ago

- The challenge for an organization is to recognize that there is a significant difference between the 'data engineer' working on a vertical project and the 'data engineer' responsible for integrating data across the enterprise.

- The project 'data engineer', in today's world, most likely will be a software developer responsible for ETL, etc. The data design will be more or less up to the software developer.

- An enterprise 'data engineer' is more concerned with data that affects the enterprise. This typically involves some sort of data integration. For example, how to integrate relevant data from N projects (e.g. A,B,C .. Z) where each project has its own idea of how to represent similar concepts (e.g. person, user, customer), with different provenance, truth assertions, access rules, data retention periods, granularity of metadata (e.g. at the attribute level vs entity level), etc. The enterprise is interested in questions like 'What did we know and when did we know it?", etc. The enterprise 'data engineer' will probably levy requirements on the project 'data engineer' to meet the enterprise's needs.

tom_b9y ago

Ignoring the breathless nature of the article, this is a buzzword label for a commodity skill set that pays a commodity salary in tech. It is also the commodity skill set that my employers have all paid me for.

There has been for a long time hype around new technology and labels for business intelligence, data warehousing, big data, and now data engineering/science. I'm not saying there are not some roles in this space that return huge value to organizations, but that these opportunities are much rarer than the buzz indicates.

I wonder if the perceived shortage is mainly hype as the shift to new cloud technologies makes many of the older ideas a little less useful - if you are plowing data into BigQuery, you probably aren't so worried about your star schema data model for reporting.

I would strongly advise people that look at these types of articles to look at the roles in question and ask "Is this role on the critical path to customers paying us?" My experience has been that the answer is often "No." This is bad. I have also seen situations where businesses that do rely on smart data integration can show that they are selling dollar bills for ten cents that still have trouble getting customers on board with spending that ten cents. Business is weird.

rch9y ago

I've heard more than one CTO/Sr. Engineer refer to people in these roles as 'data grunts' or something similarly dismissive. Then they're mystified as to why solid engineers are so quick to move up or out, year after year.

jboggan9y ago

It's digital Charlie Work [0], that's why.

I really enjoy that kind of work but it is difficult to articulate your business value in that environment. The best thing is working closely with a data scientist/front-end dev who can deliver products to the analysts and executives that need the data and make sure that you get the credit for enabling new streams of data. But most of the time you are putting out someone else's dumpster fire.

One advantage of data engineering: unlike front-end work, there are few non-technical people who will have an opinion on how you are doing things and burden you with bikeshedding.

[0] - http://www.avclub.com/tvclub/its-always-sunny-philadelphia-c...

ThePhysicist9y ago

Data engineering sounds much better than "data plumbing", but in my experience the latter is a more accurate description of the work of a data engineer: Building -and often unclogging- pipes that transport data from A to B, and putting in filters to clean it and extract the useful bits.

So why not change your LinkedIn job title to "data plumber", which is sure to get you some serious recruiter attention ;)

untilHellbanned9y ago

Ahh the ol' write a post about a not well understood distinction and then proceed to not explain the distinction.

Looks like we need more English engineers too.

cutler9y ago

I'm puzzled at the omission of Scala and Spark in this report.

lifeisstillgood9y ago

Weirdly the problem is most hires have it backwards.

Before going out to the market and discovering what talent exists and consequently what salary it will take to get them to join (ie negotiate) most organisations decide on a salary range, usually reflecting the current internal structure not the current external market.

The longer an organisation has existed the more out of whack with the market its internal set up is.

As such companies decide on their price point first, then go looking. Which is of course backwards.

realworldview9y ago

We surely need data mechanics.

slantedview9y ago

These "shortage" stories always make me roll my eyes, because they're usually about money more than anything. And money is usually about cost of living more than anything.

If you choose to locate your company in one of the highest cost of living regions in the world, then you are complicit in the "shortage". Supply and demand - pay up. Or don't.

cheriot9y ago

It was only 20 years ago that companies hired a "web master" or a generalist to do everything. But pieces of those jobs became specialized. Now we need UX, UI programmer, general engineers, dev ops, data engineers, a data scientist, etc.

And how many companies are still interviewing with fizzbuzz?

LawrenceHecht9y ago

Just checked, the # of data engineers rose to 9,246 (42%) in the last six months. So, the shortage is at least being addressed by people changing their job titles on LinkedIn.

wpiel9y ago

What I've learned from the comments: If something is valuable, there is a shortage of it.

I'm not even sure if I'm being sarcastic.

edoceo9y ago

We hire only the best! We only hire the top 1% of candidates.

But only 1 out of 100 are qualified :(

j / k navigate · click thread line to collapse

159 comments

126 comments · 26 top-level

jnordwick9y ago· 44 in thread

Whenever I see these posts I immediate translate them in my head to "we're in the middle of a talent shortage at a price I am willing to pay."

dizzystar9y ago

When I read about a "shortage," I wonder if this is more indicative of unicorn searching than anything else.

SmellTheGlove9y ago

walshemj9y ago

Quite I am sure my experience doing MR back in the early 80's for British Telecom would be usefull today - but I suspect that I might struggle to get past the hr screen.

That was when 17 top of the line supermini's Pr!me 750's was a huge cluster (we where the largest non back user in the UK) - probably about the same as a 10-20k core Hadoop setup would be today.

blihp9y ago

SmellTheGlove9y ago

sloxy9y ago

"director of data engineering"

This is already generic.

1 more reply

stale20029y ago

Although your statement is technically true, it is basically meaningless.

But this line of logic isn't very useful when talking about "shortages".

If you had to pay a million dollars for a loaf of bread, is there a shortage of bread? IE, billions of people will starve to death by next week, because they can't afford to buy food.

Most people would say "Yes, there is a shortage of bread".

When people talk about shortages, they are obviously talking about a shortage at a certain price point. There is no other definition of the word shortage that makes sense.

A good definition that I use for the term shortage is "If the government could snap its fingers and instantly produce large amounts of X overnight, would the world be a better place"?

whybroke9y ago

There is no analogy to bread or 10miilon dollar salaries.

If you want a bread analogy, it's as if I found a cheap source of bread I can sell elsewhere at a profit but then complain there's a shortage solely because the cheap stuff isn't even cheaper.

jnordwick9y ago

sbov9y ago

sksnxjis9y ago

Given the millions of unemployed Americans, it seems this is not true for at least some occupations.

Wal-Mart greeters can be wonderful people and I'm not saying they aren't valuable as humans. But in labor market terms, there is clearly not a shortage of them.

tuna-piano9y ago

Of course this is more or less always true - there are only shortages or excesses of things when prices don't or can't adjust freely.

If there was 1 gallon of water left on earth, Bill gates would buy that gallon for $50 billion, and everyone else would die of dehydration.

There has always been a shortage of maids willing to do all my house work for $10.

And there is a shortage of data engineers at $x, but there wouldn't be a shortage at $1M/year (because less companies would want one, and more people would be willing to do the work).

beachstartup9y ago

> If there was 1 gallon of water left on earth, Bill gates would buy that gallon for $50 billion, and everyone else would die of dehydration.

really? who would sell the last gallon of water on earth?

snovv_crash9y ago

Someone with a bunch of hydrogen, oxygen and a rudimentary knowledge of chemistry.

branchless9y ago

someone with liquid assets. I'll get my coat...

spoonie9y ago

Someone who just drank the last 2nd last gallon. ;)

1 more reply

wickedlogic9y ago

If they needed water to prime the last pump on earth?

p4wnc69y ago

Disney would.

biztos9y ago

SmellTheGlove9y ago

I have thought about that. Many times. I have a couple of barriers, most of which are temporary:

2 more replies

snowwrestler9y ago

trendia9y ago

In this case, though, there are way more than 6,600 people in the US that would be able to get do that data engineering job, including:

1. physicists

2. Wall St. quants

3. game programmers

4. PhD statisticians

So, the problem is not that there aren't 6,600 people in the US that can do it, it's that the companies can't pay or don't want to pay the $200,000 + that would be required to hire them.

Declanomous9y ago

2 more replies

huac9y ago

I'm going to guess most of those people couldn't set up and scale a Hadoop cluster. Are they smart enough that they could learn this stuff? Sure! But there's still a skill mismatch here.

1 more reply

ISL9y ago

Or, offer the comparably-compensated part-time job that an academic physicist would accept in parallel with continuing to work in academia.

Source: Am physicist who'd love to find sustainable part-time work at market rates.

1 more reply

blahi9y ago

phd statistician can write ETLs and data infrastructure?

1 more reply

snovv_crash9y ago

Declanomous9y ago

snowwrestler9y ago

You're not alone, but you're just reiterating that it's always possible to fill an opening by running the salary up high enough.

1 more reply

JamesBarney9y ago

Shortage has a specific term in economics, which pretty much only happens because of price controls.

However if people start liking kale, and the price goes up 20% and you start telling people about the massive kale shortage people will think you're being a little histrionic.

achompas9y ago

Alternatively, you can just join a large tech org. Netflix etc. have no problem paying good DEs north of $200k in total comp.

tobyjsullivan9y ago

"we're in the middle of a talent shortage [and don't believe in upskilling]."

ThomPete9y ago

Upskilling is one of the most ineffective costly ways to try and "re-program" workers and it mostly doesn't work because it's not about skills it's about talent.

linkregister9y ago

Talent that occurs through the genetic/epigenetic process of having attained a Data Science masters degree after earning a Computer Science degree?

I am a believer in inherent talent but Data Engineering is a skill set.

1 more reply

crispyambulance9y ago

1 more reply

etjossem9y ago

That's not always the case. Talent doesn't exist in a vacuum.

Someone with a natural talent for picking up new development skills will still learn data engineering far faster when provided with proper resources and strong internal mentorship.

I can see how you might make this observation after observing a poorly conducted training program.

1 more reply

duaneb9y ago

How would you distinguish talent from experience?

1 more reply

sdoowpilihp9y ago

This has been my experience with any "senior" engineering / BI / DS role. There is a particularly high level of price sensitivity to anything above 200k.

x0x09y ago

harichinnan9y ago

__derek__9y ago

> Except for Googles and Facebooks, the average tech startup is not making Finance industry level profits.

And they don't have finance/Google/Facebook level needs for data engineers. They can't reasonably claim to need top-level skills and then beggar out on the cost.

bobosha9y ago

>Whenever I see these posts I immediate translate them in my head to "we're in the middle of a talent shortage at a price I am willing to pay."

That's true for just about anything.

"there is no epipen crisis, only a crisis at what you are willing to pay"

"There is no poverty , only poverty at a given income level"

*"there is no crime problem, only crime problem at a given crime level"

what you are saying is self-contradictory. If you (or others) are able to turn down 150K offers...you know what you are.

wavefunction9y ago

You must admit that the price of epipens is an artificially inflated one only possible due to government imposed monopoly, not one driving by true market forces.

Poverty is simply a description wealth and is always comparative. We can define poverty as any level we so desire.

One might argue that any crime is a problem, as long as it causes an issue for society or victims.

whenwillitstop9y ago

My experiences exactly, pinged by companies obsessively for my big data skills, all trying to pay me less than I am currently making.

mattnewton9y ago· 20 in thread

PaulHoule9y ago

"Data Engineering" is most of the work that needs to be done, but I think companies haven't identified it as a category.

pjmlp9y ago

There are "Full Stack Engineer" doing pure native applications, which is what I have been doing the last three years after escaping the web back into native land.

mattnewton9y ago

PaulHoule9y ago

1 more reply

achompas9y ago

My official title is "Data Scientist" although I'm closer to the "ML Engineer" someone else mentions in a child comment.

p4wnc69y ago

geebee9y ago

"I've seen reputable, multi-billion dollar companies offering in the $140k range for this type of role in New York. That's wildly below anything reasonable for this sort of thing [in NY/SF"]

Man, do I ever agree. This is where the "shortage" argument falls apart.

achompas9y ago

I don't know. It's tough. I agree that it might be a terrible time to work in ML/computational stats because of stuff like this.

infinite8s9y ago

achompas9y ago

Oh, also: if you're in NYC I'd be happy to meet over a coffee/beer to swap stories. Feel free to use the contact info in my profile.

mattnewton9y ago

achompas9y ago

I honestly fell into it by luck. I moved to NYC, studied machine learning in grad school, networked my ass off, and landed an internship.

1 more reply

FLUX-YOU9y ago

>There's actually a shortage of data-savvy people who can also write production software

Well no kidding, that's one person doing two jobs. That's easily a 5-10 year training time depending on how high a quality you demand from their production software.

willis779y ago

mattnewton9y ago

Thank you guys! Doing Kaggle competitions is what got me interested in seriously pursuing ML in the first place. You are all seriously awesome.

ironchef9y ago

alexbeloi9y ago

You might want to look at "Machine Learning Engineer" positions if you want to do ML in practice, it's starting to be a title I see somewhat often now.

mattnewton9y ago

minimaxir9y ago

ML falls more under a Data Science role than Data Engineering, although ML is much more difficult without proper Data Engineering.

bcbrown9y ago

You should put your email in your profile. If you're in Seattle, send me an email.

dmatthewson9y ago· 13 in thread

From the article: "Data engineers are the janitors who keep your data clean and flowing."

Hm, I wonder why he's having problems hiring janitors.

pavlov9y ago

Bizarrely, I remember a recent HN discussion where a poster was arguing that any software developer who is not working in machine learning is like a plumber.

I guess this means that the entire profession consists of janitors and plumbers.

jballanc9y ago

Considering that plumbers and janitors have likely, in the entire history of human civilization, done more for health and longevity than doctors and scientists...I'm kind of ok with this analogy.

hga9y ago

Doctors, maybe, but it was the scientists who told them about the germ theory of disease, for instance.

2 more replies

p4wnc69y ago

In the minds of middle management, I think this is precisely correct.

triplesec9y ago

and as pointed out so much, is entirely why nobody wants to work for them. Respect these very bright people znd you have a starting negotiation position.

biztos9y ago

I recently had a plumber do some work on a >100-year-old apartment. I was lucky: he's a very good plumber.

The job didn't involve too many "pipelines" but the knowledge and creativity required to make them work was well above what I see from most software developers.

"Plumber" is not the put-down that poster thought it was.

jrs2359y ago

kafkaesq9y ago

Data engineers are the janitors who keep your data clean and flowing.

In a boldface font, no less. The cockiness behind that language is really quite astounding.

praccu9y ago

It's really true, though. It's brutal, ugly work with no hope of an end.

Edit: Favorite paper on the topic: http://research.google.com/pubs/pub43146.html

kafkaesq9y ago

1 more reply

jakestein9y ago

wavefunction9y ago

You should also not use "janitor" as a disparaging term. That would be another good takeaway from all of this.

jakestein9y ago

Here's the original paragraph for reference:

mrharrison9y ago· 10 in thread

We should rename this job position to Data Sanity Engineers.

Data cleaning is a super golden problem to solve.

dizzystar9y ago

mrharrison9y ago

dizzystar9y ago

Closer to an aspiring data engineer, though I've done my fair share of ETL, cleaning, database building / rebuilding, admin. Prior jobs have been database engineer, probably closer to DBA.

2 more replies

banku_brougham9y ago

    sed -i 's/emacs/sublime-text/g' what_u_said.txt

ajamesm9y ago

more like Ctrl-H, tab, 'emacs', tab, 'sublime-text', tab, enter, esc, Ctrl-S

1 more reply

kafkaesq9y ago

Not only that but you need to have smart people cleaning the data,

Which are difficult to find when you think of them as "janitors", and treat them accordingly.

msie9y ago

Data Sanitation Engineers

SmellTheGlove9y ago

I do it for a living. It seems underappreciated in the industry.

mrharrison9y ago

SmellTheGlove9y ago

This kind of work requires a manager who will actively market it within the organization.

ef5a0b06289y ago· 2 in thread

googletazer9y ago

" I have been on over 25 in person interviews and gone through untold degrading whiteboard interviews, code tests, trick questions, and take home projects; all have ended in rejection."

The fact that you pulled through 25 of them is already commendable. Unfortunately as a labor provider you'll be subjected to all kinds of crap for the privilege of working.

ultramagas9y ago

Hey, I'm going through something similar. I had to quit an amazing job because my wife and I pursued a dream and moved to Europe (no remote).

There's definitely not a shortage of talent. It's that every company thinks they need "A-players", when the vast, vast majority are doing a damn basic CRUD app.

protomyth9y ago· 2 in thread

I'm starting to think that the message is if HR is going to do checklists then developers should really make sure they work mostly with contracts that use popular checklist items.

mulmen9y ago

pyb9y ago

That's because the system is already optimised for the needs of HR people.

makmanalp9y ago· 2 in thread

Quick sidenote, anyone know where the databases / distributed systems engineering jobs are at? E.g. if one wanted to not use these tools but also go help build these tools?

I can think of Facebook, Google, Microsoft, IBM (which locations and groups within these companies / where?). I can also think of Confluent, CitusDB, Databricks, etc.

rhizome9y ago

Market Research is a $40B industry that depends almost completely on these concepts. I'm not sure how prevalent distributed systems are with MR companies, but that's an implementation detail anyway.

serge2k9y ago

> that's an implementation detail anyway.

Which is what the poster was asking for.

moandcompany9y ago· 2 in thread

I am a data engineer working on a machine learning team with models actively used as part of our product(s).

Some of my quick thoughts and anecdata:

Companies have made large investments in creating 'data science' teams, and many of those companies have trouble realizing value from those investments.

A part of this stems from investments and teams with no tangible vision of how that team will generate value. And there are several other contributing factors…

There are people that wish to call themselves “data scientists” that “don’t want to write code” and would “prefer to do the analysis and storytelling”

Short story: if you view data engineering as "janitorial work" you're missing the big picture

Anyone else notice that the attributes of a 'unicorn' data scientist include the traits of a 'data engineer?'

vijayr9y ago

beachstartup9y ago

someone with enough smarts to build/lead a team, sell to executive management, and have an actual business application is just too rare compared to the prevalence of the engineering talent.

skynetv29y ago· 1 in thread

anything and everything is marketed as "data science" and "data engineering" these days becasue this is the buzzword of the day.

"Serial Entrepreneur" is the same these days, claimed by anyone who had a lemonade stand as a kid.

Swizec9y ago

> I am still a software engineer working on what benefits my organization

But if you saw a nearby local maximum that's higher than your current local maximum, wouldn't you change what you call yourself, if it means being paid more but doing the same work?

This is similar to how the average "software engineer" makes about $30k/year more than the average "programmer".

GeneralMayhem9y ago· 1 in thread

There are 6600 jobs listed and 6500 individuals on LinkedIn with that particular title, and therefore there's a shortage? Seriously?

* How many aren't on LinkedIn?

* Since the whole article is about how the job title is poorly defined and growing in prevalence, why would you assume that people who don't already have such a job would use the term?

* The "growth" charts on the full study are just as bad - how much of that is just from renaming existing generic developer positions, since "data engineer" is clearly a relatively new term?

sportanova9y ago

6500 data engineers on all of Linkedin, but 6600 job openings in the bay area. so there are more job openings in one area than all data engineers on linkedin

binalpatel9y ago· 1 in thread

The fact that the original, unmodified article referred to data engineers as "janitors" pretty much says it all.

In my opinion there shouldn't be a delineation. You shouldn't be a data scientist if you can't gather, process, and clean up your own data.

biztos9y ago

Ideally you'd have a symbiosis, and each side would recognize the importance of the other.

Plus as a side effect, your software will probably be better.

otto_ortega9y ago· 1 in thread

Am I the only one who thinks there will be a ton of people changing their job title on LinkedIn to "Data Engineer" as a result of this article?

collyw9y ago

I am thinking about it. Actually a friend recommended that I change my title to Data Engineer a few months back.

collyw9y ago· 1 in thread

njd9y ago

- The project 'data engineer', in today's world, most likely will be a software developer responsible for ETL, etc. The data design will be more or less up to the software developer.

tom_b9y ago

rch9y ago

jboggan9y ago

It's digital Charlie Work [0], that's why.

One advantage of data engineering: unlike front-end work, there are few non-technical people who will have an opinion on how you are doing things and burden you with bikeshedding.

[0] - http://www.avclub.com/tvclub/its-always-sunny-philadelphia-c...

ThePhysicist9y ago

So why not change your LinkedIn job title to "data plumber", which is sure to get you some serious recruiter attention ;)

untilHellbanned9y ago

Ahh the ol' write a post about a not well understood distinction and then proceed to not explain the distinction.

Looks like we need more English engineers too.

cutler9y ago

I'm puzzled at the omission of Scala and Spark in this report.

lifeisstillgood9y ago

Weirdly the problem is most hires have it backwards.

The longer an organisation has existed the more out of whack with the market its internal set up is.

As such companies decide on their price point first, then go looking. Which is of course backwards.

realworldview9y ago

We surely need data mechanics.

slantedview9y ago

These "shortage" stories always make me roll my eyes, because they're usually about money more than anything. And money is usually about cost of living more than anything.

If you choose to locate your company in one of the highest cost of living regions in the world, then you are complicit in the "shortage". Supply and demand - pay up. Or don't.

cheriot9y ago

And how many companies are still interviewing with fizzbuzz?

LawrenceHecht9y ago

Just checked, the # of data engineers rose to 9,246 (42%) in the last six months. So, the shortage is at least being addressed by people changing their job titles on LinkedIn.

wpiel9y ago

What I've learned from the comments: If something is valuable, there is a shortage of it.

I'm not even sure if I'm being sarcastic.

edoceo9y ago

We hire only the best! We only hire the top 1% of candidates.

But only 1 out of 100 are qualified :(

j / k navigate · click thread line to collapse