Not only I need to perform at the Software Engineer level expected for the position (with your standard leetcode style interviews), but I need to pass extra ML specific (theory and practice) rounds. Meanwhile the vast majority of my work consist of getting systems production ready and hunting bugs.
If I have to jump through so many hoops when changing jobs I'll seriously consider a regular non-ML position.
I went through all that and am a SWE again instead of an ML engineer. The one thing I learned from all that? "The very best models are distilled from postdoc tears".
I feel the biggest problem for people without an ML background is that you'd think "I don't know what I'm doing, I can't get hired for this job!", but fact is that people with ML backgrounds mostly don't know what they are doing either. They just get standard results by applying standard libraries, any programmer with some math skills could do the same, it is no harder than learning a frontend or backend framework, people just think it would be harder so they lack confidence about it. There are some gotchas you got to learn, but there are a lot of gotchas in both backend and frontend as well.
Then again, my company business model leads to terrible hires anyway.
What about asking for more money at the end? Multi-stage complex interview process eliminates more candidates. Some, like you say, will opt for a developer gig instead, probably because ML wasn't something they were interested in to begin with. That narrows down the list of candidates even more. Either "play the game" and ask for more money or don't play the game at all. Let employers pay extra for polished candidates.
One startup asked me this. They gave me a very vague problem statement, and in 2 days I had to find a couple of recent articles relevant to the problem and prepare a presentation explaining my solution and justifying my decisions.
During the cold war, the U.S.A developed a speech to text (STT) algorithm that could theoretically detect the hidden dialects of Russian sleeper agents. These agents (Fig. 3.7), were trained to speak English in Russia and subsequently sent to the US to gather intelligence. The FBI was able to apprehend ten such hidden Russian spies and accused them of being "sleeper" agents.
The Algorithm relied on the acoustic properties of Russian pronunciation of the word (v-o-k-s-a-l) which was borrowed from English V-a-u-x-h-a-l-l. It was alleged that it is impossible for Russians to completely hide their accent and hence when a Russian would say V-a-u-x-h-a-l-l, the algorithm would yield the text "v-o-k-s-a-l". To test the algorithm at a diplomatic gathering where 20% of participants are Sleeper agents and the rest Americans, a data scientist randomly chooses a person and asks him to say V-a-u-x-h-a-l-l. A single letter is then chosen randomly from the word that was generated by the algorithm, which is observed to be an "l". What is the probability that the person is indeed a Russian sleeper agent?
base odds: 20:80 = 1:4
relative odds = (1 letter/6 letters) : (2 letters / 8 letters) = 2/3
posterior odds = 1:4*2:3 = 1:6
Final probability = 1/(6+1) = 1/7 or roughly 14.2%
Bayes rule with raw probabilities is a lot more involved.A = the event they are a spy B = the event that an l appears
And ^c denote the complement of these events. Then,
P(A) = 1/5
P(A^c) = 4/5
P(B|A) = 1/6
P(B|A^c) = 1/4
P(A|B) = P(B|A)P(A)/P(B)
By law of total probability,
P(B) = P(B|A)P(A) + P(B|A^c)P(A^c)
Which is very standard formulation and really just your equation as you can rewrite everything I have done as:
P(A|B) = 1/(1 + P(B|A^c)P(A^c)/P(B|A)P(A))
Which is the base odds, posterior odds, and odds to probability conversion all in one. The reason why this method is strictly better in my opinion is because the odds breaks down simply if we introduce a third type of person which doesn't pronounce l's. Also, after doing one homework's worth of these problems, you just skip to the final equation in which case my post is just as short as yours.
With S = sleeper, and L = letter L, and remembering "total probability":
P(L) = P(L|S)P(S) + P(L|-S)P(-S),
(where -S is not S), we have by Bayes P(S|L)
= P(L|S) P(S) / P(L)
= P(L|S) P(S) / (P(L|S)P(S) + P(L|-S)P(-S))
= 1/6 * 1/5 / (1/6*1/5 + 1/4*4/5)
= 1/30 / (1/30 + 6/30)
= 1/7Some Japanese-American soldiers would be SOL tho.
The "model is in place, but I have no clue what's doing and so it can fail without me understanding when and how is straw-man". Especially for supervised learning, that is, we have a label for data, it is immediately clear whether the output of the model is "bunk, useless, or even harmful". There is no "fail silently by design".
I have been working in the field for almost 20 years in academia and in industry and it is not that I starting every PCA thinking about eigenvectors and eigenvalues and if you ask me now without preparing what are those, I would be between approximately right and wrong. But I fit many, many very accurate models.
That's why there's so much iteration and feedback gathering (e.g. A/B tests) as a part of DS/ML, which incidentally is rarely a part of the interview loop.
Anyone who claims they can get a good model the first time they train it is dangerously optimistic. Even the "how it works" aspect has become more and more marginal due to black boxing.
My guess would be that more machine learning projects go off the rails for want of understanding the data or the {business, research} problem.
People say things like "you need to know how it works" but "it" doesn't work using your knowledge of eigenvectors. If you want to test how "it" works, test that, literally. Put up a model on the board and a dataset. Ask people about what might happen when you apply one to the other. What changes they would make in response to changes in the data. What they would do in response to the following training curves, budget limitations, etc.
These interviews are terrible and they select for people that regurgitate facts.
But there is a threshold where it stops being a test of foundational knowledge and starts being a test of arbitrary trivia, and favors who has the most free time to study and memorize said trivia.
It’s really looking like another rat race. Especially since there’s no central authority, every hiring manager has the potential to invent their own filter, and make it arbitrarily harder or easier based on supply and demand (and then the filter drifts away from the intended purposes).
Exactly. Whenever eigenvectors come up during interviews, it’s usually in the context of asking a candidate to explain how something elementary like principal components analysis works. If they claim on their CV to understand PCA, then they’d better understand what eigenvectors are. If not, it means they don’t actually know how PCA works, and the knowledge they profess on their CV is superficial at best.
That said, if they don’t claim to know PCA or SVD or other analysis techniques requiring some (generalized) form of eigendecomposition, then I won’t ask them about eigenvectors. But given how fundamental these techniques are, this is rare.
People know pity passes exist for Master's degrees. You can't trust that someone actually knows what they should know just because they have a degree. Ditto professional experience. The entire reason FizzBuzz exists is because people with years of profesional experience can't program.
On top of the fact that these problems are often poorly selected, poorly communicated, conducted under completely unrealistic time pressure, often as pile-ons (with 3-4 strangers as if just to add pressure and distraction), and (these days) over video conferencing (so you have to stare in the camera and pretend to make eye contact with people while supposedly thinking about your problem, on top of shitty acoustics), etc, etc.
It's just fucking ridiculous.
Just as in programming, the world is full of people who can recite facts but don't understand them. There is no point in asking what an L1 norm is and asking for its equation. Or say, giving someone the C++ code that corresponds to computing the norm of a vector and asking them "what does this do". Or even worse, showing them some picture of some cross-validation scheme and asking them to name it. Yes, your candidates should be able to do this, but positive answers to these kinds of questions are nearly useless. These are the kinds of questions you get answers to by Googling.
It's far more critical to know what your candidate can do, practically. Create a hypothetical dataset from your domain where the answer is that they need to use an L1 norm. Do they realize this? Do they even realize that the distance metric matters? Are they proposing reasonable distance metrics? Do they understand what goes wrong with different distance metrics? etc. Or problems where they need to use a network but say, padding matters a lot. Or where the particulars of cross validation matter a lot.
This also gives you depth. "name this cross validation scheme" gives you a binary answer "yes, they can do it, or no they can't" And you're done. If you have a hypothetical dataset, you can keep prodding. "Ok, but how about if I unbalance the data" or "what if we now need to fine tune" or "what if the payoffs for precision and recall change in our domain", "what if my budget is limited", etc. It also lets you transition smoothly to other kinds of questions. And to discover areas of deeper expertise than you expected. For example, even for the cross validation questions, if you ask that binary question, you might never discover that a candidate knows about how to use generalized cross validation, which might actually be very useful for your problem.
The uninformative tedious mess that we see in programming interviews? This is the equivalent for ML/DL interviews!
https://www.deeplearningbook.org/
Also there are various courses and lectures but that needs time and effort. There is no short cuts like the book posted by OP.
For example what is the definition of two events being independent in probability?
Or the L1 norm example: 'Which norm does the following equation represent? |x1 − x2| + |y1 − y2|'
Find the taylor series expansion for e^x (this is highschool maths).
Find the partial derivatives of f (x, y) = 3 sin2(x − y)
Limits etc...
These aren't specific to deep learning or machine learning, not that I claim to be a practitioner.
Maybe that kind of questions are ok for people without expirience but not for seniors.
The SQL questions can also be a symptom of the type of job - Facebook's first data science round focuses a lot on SQL but that's because it's a very product/analytics/decision-making focused role without that much coding or ML. With data science you have to be more careful about these things when searching for a job; you can't just use the job title as a descriptor.
Edit:
It seems the overlapping text also occurs on some pdf readers: https://github.com/BoltzmannEntropy/interviews.ai/issues/2
Maybe I've just been interviewing at the wrong places, I'd be very curious if anyone here has been asked to even explain Fisher information in any DS interview?
It's not that Fisher information is a particularly tricky topic, but I certainly wouldn't put it as a "must know" for even the most junior of data scientists. Not that I wouldn't mind living in a world where this was the case... just not sure I live in the same world as the authors.
I've been looking for something exactly like this – and it's executed better than I could have imagined.
(Needs a good proofreader still, though! Also, whatever custom LaTeX template the authors are using is misbehaving a bit in various places. Still great content.)
I am 99% certain I would not have passed the interview bars set today. More specifically, the breadth they expect you to master is very puzzling (and seemingly unrealistic).
1) Written by people who has no experience in industry or they are not working on "real" machine learning jobs
2) They think the standard in industry is pretty low and any BS works. For example the concept of "lagrange multiplier" is missing from the book. One need this concept to understand training convergence guarantee.I currently work for a non-profit investigating making a free high quality set of courses in this space, and would love to talk to as many people either working in ML/DS or looking to get into the field. (I have ideas but would prefer to ground them in as many real-world experiences as I can collect.)
If anyone here wouldn't mind chatting about this, or even just sharing an experience or opinion, please drop me an email (in my profile).
EDIT: We already have Into to DS, and a Deep RL sequence far along in our pipeline, but are looking to see where we can help the most with available resources.
I really appreciate this Interviews book as an example of what topics might be necessary (and at what level), taking into account the qualifying discussion here, of course.
As someone with a strong background in statistics, please tell me where I can find DS jobs that require this.
For me and all my statistics friends in DS we find much more frustration in how hard it is to pass DS interviews when you understand problems deeper than "use XGBoost". I have found that very few data scientists really even understand basic statistics, I failed an interview once because an interviewer did not believe that logistic regression could be used to solve statistical inference questions (when it and more generally the GLM is the workhorse of statistical work).
And to answer your question, whenever I'm in a hiring manager position I very strongly value strong software engineering skills. DS teams made up of people that are closer to curious engineers tend to greatly outperform teams made up of researchers that don't know you can write code outside of a notebook.
It's not really tested for in most places though, where they regard a DS as a service that produces models.
1) the titles will vary a lot (software engineer, ML engineer, research engineer, data scientist etc.) which makes it hard to locate those jobs and to move in the job market in general
2) you still need a reasonable amount of theory (not necessarily too much statistics) to use the tools well. And in all likelihood you will be tested on it in some way during the interviews.
3) the interviews/job descriptions that don't emphasise the theory often will be for jobs where you get a title like Machine Learning Engineer but you focus more on the infrastructure rather than on the ML code
However, one of the important things when interviewing someone is that the person has not seen the question before. So as an interviewer my impulse would be to first ensure that my question is NOT in this book :)
Or perhaps even if it is in the book, if the question is advanced enough, I could test how they articulate and reason through the solution, so I know they are not simply regurgitating the answer?