I've worked with very large amounts of data and high performance computing for most of my career; I mostly had finance related jobs in the last decade or so. I have most of the skill you want, including some you don't know you want. However when salary comes up, that is where we start to part ways. If you are really serious about a shortage, you should be really serious about making offers that can be competitive, but I keep seeing the same $150k offers. That isn't a "shortage" kind of offer.
I've been looking for work in data engineering and databases for 9 months, and while I'm certainly not as qualified and experienced as you are, I consider myself capable. I've definitely passed the take home and whiteboard tests I've been given, etc.
When I read about a "shortage," I wonder if this is more indicative of unicorn searching than anything else.
Once a resume gets to me, and I'm only speaking for myself here, I'm looking for the challenges you've faced and the problems you've solved. I actually care very little about what tech you used because odds are we'll have something different, but we'll need to solve problems. If someone is solid in some related technical skillset, can think critically, and communicate the details of what they've tackled in the past, learning our specific tech stack is going to be the easy part.
Let me put it another way - when I look for interns or entry level hires, the number of those that can do more than spell SAS or Teradata approaches zero very quickly. But if they've solved challenges of the magnitude that they'd be expected to solve with us initially, the tech is secondary to process and problem solving. As we look more experienced, I'd still be limiting myself to candidates from a set of "legacy" industries that prefer these sorts of tools if I insisted on checking those boxes at the outset. I'd prefer to teach a really smart person to use the things that they don't know yet if I have it my way.
I don't fancy it up too much, either. I build teams that make the data move and land it clean so that your PhDs can do the smaaaht stuff with it. I can stack BI and Analytics on top, but a lot of people can do that starting from clean data - and clean data is what I do. But I do get the impression that we're viewed as janitors and plumbers - who you'd be thrilled to see at 3am when your shit(ter) broke, right?
Yes, you can always always always find somebody to do a job is your a willing to pay 10 million dollars. That means that "shortages" are impossible. It means that you can never have a shortage in any situation, because you can always pay 10 million dollars for a single visit to the doctor.
But this line of logic isn't very useful when talking about "shortages".
If you had to pay a million dollars for a loaf of bread, is there a shortage of bread? IE, billions of people will starve to death by next week, because they can't afford to buy food.
Most people would say "Yes, there is a shortage of bread".
When people talk about shortages, they are obviously talking about a shortage at a certain price point. There is no other definition of the word shortage that makes sense.
A good definition that I use for the term shortage is "If the government could snap its fingers and instantly produce large amounts of X overnight, would the world be a better place"?
If the answer is "Yes, the world would be in a much much better place", then that means there is a shortage of X. If the answer is "No, the world would only be a little better". Then that means that there is NOT a shortage of X.
A company found something it could profit from more if it paid less than current market value. That is all. They are not saying there are no qualified applicants. They are not saying they want 10million dollars.
What is the case is that a business finds a resource (the perfect hire) that they wish to profit from but do not want to pay the market value for it because that would reduce profits. Rather than be satisfied with what would be an erosion of profit (or an admission of an unworkable business model) articles are posted to demand government pressure wages downward.
If you want a bread analogy, it's as if I found a cheap source of bread I can sell elsewhere at a profit but then complain there's a shortage solely because the cheap stuff isn't even cheaper.
If there was 1 gallon of water left on earth, Bill gates would buy that gallon for $50 billion, and everyone else would die of dehydration.
There has always been a shortage of maids willing to do all my house work for $10.
And there is a shortage of data engineers at $x, but there wouldn't be a shortage at $1M/year (because less companies would want one, and more people would be willing to do the work).
really? who would sell the last gallon of water on earth?
1. I have student debt from my law degree, and I have a lower risk tolerance until that's paid off. 2. My daughter is 4, it's nice to be around for the early years, and the corporate gig is quite comfortable in terms of hours. 3. I'm in Maine. Most clients would require me to travel, which impacts #2.
I do have a former colleague here that started a data consultancy. I should grab a beer with him and see if we have common ground in the short term. It's not quite starting your own thing, but it might be fun.
Thought experiment: If 100 companies had openings for a skill set that only one person could deliver, all 100 companies could eventually fill their openings by sequentially outbidding each other for the services of that one person.
So how would we know if a talent shortage really exists for a certain job? I can think of a couple potential hints: if starting salaries are going up much faster than the national average, or if the unemployment rate for that job is much lower than the national unemployment rate. Either would seem to indicate that, relative to the job market as a whole, there was a greater demand than supply for that particular job.
1. physicists
2. Wall St. quants
3. game programmers
4. PhD statisticians
So, the problem is not that there aren't 6,600 people in the US that can do it, it's that the companies can't pay or don't want to pay the $200,000 + that would be required to hire them.
However if people start liking kale, and the price goes up 20% and you start telling people about the massive kale shortage people will think you're being a little histrionic.
Alternatively, you can just join a large tech org. Netflix etc. have no problem paying good DEs north of $200k in total comp.
Also Finance requires proper education and training. Not so much for App development. So for everyone who complains about getting $150K offers, there are a 100 thousand people right here in US applying for $60K technical analyst jobs.
And they don't have finance/Google/Facebook level needs for data engineers. They can't reasonably claim to need top-level skills and then beggar out on the cost.
That's true for just about anything.
"there is no epipen crisis, only a crisis at what you are willing to pay"
"There is no poverty , only poverty at a given income level"
*"there is no crime problem, only crime problem at a given crime level"
what you are saying is self-contradictory. If you (or others) are able to turn down 150K offers...you know what you are.
Poverty is simply a description wealth and is always comparative. We can define poverty as any level we so desire.
One might argue that any crime is a problem, as long as it causes an issue for society or victims.
I have been thrown these projects at work before, where I'm the frontend engineer and I need to make some cool D3 visualization, but low behold the data is shit, and I have to help the backend team make the data useable. It's a mind-numbing job, that nobody wants, because it sounds like a one month task to get a good REST API up and working, but it usually takes three months, because you have to go back and forth making sure the data is right, and there is always 10 tricky edge cases that you have to work some magic on. Not only that but you need to have smart people cleaning the data, so that you don't make some big mistake down the line or your REST API is super slow, and you have to add another couple weeks or month to rework the data again. So that one month becomes three months, and most likely a year, because somebody will say that looks great but can we also add this, and it goes on and on. It's literally a mind-numbing job that most nobody wants. I have found that products like Tableau are the best for this, you still have to clean the data, but it helps speed up the process.
Data cleaning is a super golden problem to solve.
Give me emacs and a command line, and I have all the truth I need, which is far more honest, in my mind, than anything that can be created with D3 or Tableau. Beauty is in the eye of the beholder, and it doesn't really do anyone service to look down on the work others find enjoyable. If doing D3 makes you happy, that is awesome, and I can only congratulate you for your passion and your ability to look forward to work I don't "get," and I wish the feelings would be mutual.
Which are difficult to find when you think of them as "janitors", and treat them accordingly.
Hm, I wonder why he's having problems hiring janitors.
I guess this means that the entire profession consists of janitors and plumbers.
The job didn't involve too many "pipelines" but the knowledge and creativity required to make them work was well above what I see from most software developers.
"Plumber" is not the put-down that poster thought it was.
In a boldface font, no less. The cockiness behind that language is really quite astounding.
Edit: Favorite paper on the topic: http://research.google.com/pubs/pub43146.html
There has been for a long time hype around new technology and labels for business intelligence, data warehousing, big data, and now data engineering/science. I'm not saying there are not some roles in this space that return huge value to organizations, but that these opportunities are much rarer than the buzz indicates.
I wonder if the perceived shortage is mainly hype as the shift to new cloud technologies makes many of the older ideas a little less useful - if you are plowing data into BigQuery, you probably aren't so worried about your star schema data model for reporting.
I would strongly advise people that look at these types of articles to look at the roles in question and ask "Is this role on the critical path to customers paying us?" My experience has been that the answer is often "No." This is bad. I have also seen situations where businesses that do rely on smart data integration can show that they are selling dollar bills for ten cents that still have trouble getting customers on board with spending that ten cents. Business is weird.
From my P.O.V., "Full Stack Engineer" is a place you don't want to be because it means putting out fires with whatever junk javascript is in the front end. It seems like everybody who's built a serious javascript application has invented their own Virtual DOM because none of the popular Virtual DOM libraries are good for much other than wasting time and CPU cycles.
"Data Scientist" is a bad title in it's own way, in the sense that "Computer Science" is bad, but worse. To a lot of people there is a Brahmin kind of attitude associated with "Scientist" -- i.e. an aversion to getting your hands dirty. Real world data is pretty dirty and you aren't going to get far in getting value out of it unless you spend 80-90% of your time dealing with the dirt.
Frankly speaking, if your company doesn't need a data engineer, it won't hire one or move you into that role. They likely don't, either, if you're experiencing this pushback -- data engineers often develop ETL pipelines or data warehouses, both of which are very useful if your company has a data team and very useless if it does not.
That said, you may want to move closer to my role. There's actually a shortage of data-savvy people who can also write production software, and you would nicely complement a more research-inclined data scientist or analyst -- someone with far more experience with research/analysis than development.
I experience the same problem with shortage-at-price-X in the field you describe. I'm a machine learning engineer with experience in MCMC methods, but I also have a lot of low-level Python and Cython experience, some intermediate experience with database internals, and lots of experience writing well-crafted code for production systems.
There are basically zero companies willing to pay what I'm seeking (which is a salary based on my previous job and a few offers I got around the time I took that job). In fact, in some of the more expensive cities, the real wage offered is far lower than other markets.
I've seen reputable, multi-billion dollar companies offering in the $140k range for this type of role in New York. That's wildly below anything reasonable for this sort of thing in New York. I've seen companies in Minneapolis offering $130k for the same kind of job -- and even that is still too low for Minneapolis! The same has been true in San Francisco as well.
Because these companies value you more for simply looking good on paper and looking good as a piece of office ornamentation when investors stroll through, and they view you as an arbitrary work receptacle closer to a software janitor than a statistical specialist, their whole mindset is about how to drive wage down.
Frankly, given the stresses of the job and the risk of burnout, I think it's actually a terrible time to be in the machine learning / computational stats employment field, despite all of the interesting new work and advances being made. The intellectual side is good, but the quality of jobs is through the floor.
Well no kidding, that's one person doing two jobs. That's easily a 5-10 year training time depending on how high a quality you demand from their production software.
I'll look again at the board but, I didn't see anything there before that wanted software engineering skills (which I have with industry experience), and not a graduate degree (which I don't), and happened to be commutable from my place just south of the bay. But I will keep looking!
As others have pointed out Data Engineering is more about building data pipelines, making architecture decisions for your ML stack, things like that. Less about model building, prototyping and training, which is what I think of when somebody says they 'do' ML.
It seems people in this industry refuse to understand that some people are not perfect. I never graduated college because I hated it with the very fiber of my being, so I am not particularly great at white boarding answers to algorithm questions off the top of my head in a high pressure environment. If I need them during my job, I look up answers and learn from people who are much smarter than I am.
My personal identity has been shattered, as I thought my ~5-10 year history of success in the industry indicated I was in demand and talented. I saw posts like this and thought that if the worst happened I'd still be able to find a job. The idea that there is a talent shortage is a lie, or candidates like me wouldn't be treated as I have been. I'm not asking for a free job, or a handout. I have had a successful career so far and am capable of doing good work. But I'm not a specialist in Big Data Machine Learning Neural Networks.
I have struggled with bipolar disorder and suicidal ideation most of my life. I've dealt with the death of my beloved grandmother and my father who was instrumental in my choosing to be an engineer with only minor lapses in control. Nothing has caused me to consider taking my own life as much as the past 6 months. It seems there is no future for me in the only career I have any skill in and which is a huge part of my identity. And to constantly be told that there is such a shortage of engineers only salts the wound.
The fact that you pulled through 25 of them is already commendable. Unfortunately as a labor provider you'll be subjected to all kinds of crap for the privilege of working.
Every single person on here needs to have a secondary business going on right now. Doesn't have to be a highly skilled industry either, selling hand made stuff on Etsy can be a lifeline in these situations.
I had always had an easy time getting a job before but this time it was different. Granted I knew it'd be tougher since for remote jobs, the world is the competition. But it was a summer of endless shitty timed hackerrank-style tests (virtual whiteboard hazing). I would tell my co-workers about them and they'd laugh in bewilderment at the questions that were asked in what should be a technical screener, and these are extremely smart and productive software guys that have started companies, written books, give conference talks. One funny question I got for a frontend React job: write a function that takes a sequence of bits that represent a negative-binary number (not a base-2 number that is negative, but a base-(-2) number) and return its negated value in base-2. For a frontend job. It was one of 4 questions to be answered in 90 minutes. gtfo.
A few companies would reply, most strung me along while -- I realize now -- they were keeping me as a backup(-backup) incase their "A-player" turned them down. Countless interviews, hours on takehome projects, it was tough. I learned to cut bait if the company was slow to move forward, had weeklong periods of no communication, etc.
I (just very recently) found it's easier to land small contract gigs because the barrier to entry seems to be lower, demonstrate value, and keep getting work from those guys after the initial project was done. It is different but so far I actually like the freedom that comes with contracting. I haven't been at it long enough to experience the downsides.
There's definitely not a shortage of talent. It's that every company thinks they need "A-players", when the vast, vast majority are doing a damn basic CRUD app.
Just wanted to say I hear you brother and share my story in some solidarity. You will find something, just keep plugging away. Each "failed" attempt makes you better no matter how many attempts it takes. Cliche of course but it is true. I am very lucky in that I don't face the mental demons you do, even then this job search hit me pretty hard. Please be proactive and take care of yourself, body and mind (body goes a long way toward mind also).
I've been dealing with large data even before "big data" was a word but i dont call myself "data scientist" or "data engineer". I am still a software engineer working on what benefits my organization.
"Serial Entrepreneur" is the same these days, claimed by anyone who had a lemonade stand as a kid.
But if you saw a nearby local maximum that's higher than your current local maximum, wouldn't you change what you call yourself, if it means being paid more but doing the same work?
This is similar to how the average "software engineer" makes about $30k/year more than the average "programmer".
I really enjoy that kind of work but it is difficult to articulate your business value in that environment. The best thing is working closely with a data scientist/front-end dev who can deliver products to the analysts and executives that need the data and make sure that you get the credit for enabling new streams of data. But most of the time you are putting out someone else's dumpster fire.
One advantage of data engineering: unlike front-end work, there are few non-technical people who will have an opinion on how you are doing things and burden you with bikeshedding.
[0] - http://www.avclub.com/tvclub/its-always-sunny-philadelphia-c...
* How many aren't on LinkedIn?
* Since the whole article is about how the job title is poorly defined and growing in prevalence, why would you assume that people who don't already have such a job would use the term?
* The "growth" charts on the full study are just as bad - how much of that is just from renaming existing generic developer positions, since "data engineer" is clearly a relatively new term?
It's very analogous to front-office and back-office work in Investment Banking. "Data Scientist" are the front-office, with all the prestige, and "Data Engineers" are the back-office, doing a lot of the heavy lifting without nearly as much recognition.
In my opinion there shouldn't be a delineation. You shouldn't be a data scientist if you can't gather, process, and clean up your own data.
Even if you require your data scientists to be able to do engineering work, it's probably way more efficient to have some good generalist Software Engineers doing all the "pre-math" work and freeing your statisticians up for what they're (hopefully) good at.
Plus as a side effect, your software will probably be better.
So why not change your LinkedIn job title to "data plumber", which is sure to get you some serious recruiter attention ;)
Looks like we need more English engineers too.
I'm starting to think that the message is if HR is going to do checklists then developers should really make sure they work mostly with contracts that use popular checklist items.
I can think of Facebook, Google, Microsoft, IBM (which locations and groups within these companies / where?). I can also think of Confluent, CitusDB, Databricks, etc.
Which is what the poster was asking for.
Before going out to the market and discovering what talent exists and consequently what salary it will take to get them to join (ie negotiate) most organisations decide on a salary range, usually reflecting the current internal structure not the current external market.
The longer an organisation has existed the more out of whack with the market its internal set up is.
As such companies decide on their price point first, then go looking. Which is of course backwards.
If you choose to locate your company in one of the highest cost of living regions in the world, then you are complicit in the "shortage". Supply and demand - pay up. Or don't.
From my experiences working in various contexts (applied machine learning, analytics, policy research, academics, etc...), there are several of factors that contribute to this shortage: (1) "data engineering" often requires a lot of breadth and knowledge, (2) "data engineering" is often (derisively and naively) referred to as the "janitorial work" of data science, (3) the spectrum of roles and requirements within the "data engineering" domain, in terms of job descriptions, can range from database systems administration, to ETL, to data warehousing, curation of data services / APIs, business intelligence, to the design/deployment/operation of pipelines and distributed data processing and storage systems (these aren't mutually exclusive, but often job descriptions fall into one of these stovepipes).
Some of my quick thoughts and anecdata:
Companies have made large investments in creating 'data science' teams, and many of those companies have trouble realizing value from those investments.
A part of this stems from investments and teams with no tangible vision of how that team will generate value. And there are several other contributing factors…
"Dirty work." People haven't learned how to, and more often don't want to do it. There's a vast number of tutorials and boot camps out there that teach newcomers how to "learn data science" with clean datasets -- this is ideal for learning those basics, but the real world usually does not have clean or ideal datasets -- the dataset may not even exist -- and there are a number of non-ideal constraints.
There are people that wish to call themselves “data scientists” that “don’t want to write code” and would “prefer to do the analysis and storytelling”
Engineering as the application of science with real world constraints: there are a number of factors that we take into account, often acquired through painful experience, that aren’t part of these tutorials, bootcamps, or academic environments.
Many “data scientists” I’ve met have a hard time adapting to and working with these constraints (e.g. we believe that the application of data science would solve/address __ problem, but: how do we know and show that it works and is useful? what are the dependencies, and costs of developing and applying that solution? is it a one-time solution, or is it going to be a recurring application? does the solution require people? who will use it? what are the assumptions or expectations of those operators and users? is it suitable? is it maintainable? is it sustainable? how long will it take? what are the risks involved and how do we manage them? is it re-usable, and can we amortize its costs over time? is it worth doing? This is part of a methodology that comes from experience, versus what is taught in data science)
Larger teams with more people/financial/political resources can specialize and take advantage of these divisions of labor, which helps recognize the process aspects of applying data science and address some of the above
Short story: if you view data engineering as "janitorial work" you're missing the big picture
Anyone else notice that the attributes of a 'unicorn' data scientist include the traits of a 'data engineer?'
someone with enough smarts to build/lead a team, sell to executive management, and have an actual business application is just too rare compared to the prevalence of the engineering talent.
And how many companies are still interviewing with fizzbuzz?
- The project 'data engineer', in today's world, most likely will be a software developer responsible for ETL, etc. The data design will be more or less up to the software developer.
- An enterprise 'data engineer' is more concerned with data that affects the enterprise. This typically involves some sort of data integration. For example, how to integrate relevant data from N projects (e.g. A,B,C .. Z) where each project has its own idea of how to represent similar concepts (e.g. person, user, customer), with different provenance, truth assertions, access rules, data retention periods, granularity of metadata (e.g. at the attribute level vs entity level), etc. The enterprise is interested in questions like 'What did we know and when did we know it?", etc. The enterprise 'data engineer' will probably levy requirements on the project 'data engineer' to meet the enterprise's needs.
I'm not even sure if I'm being sarcastic.
But only 1 out of 100 are qualified :(