Show HN: Real world (Jupyter notebook embed) way to assess data scientists (opens in new tab)

(hackerrank.com)

134 pointsrvivek6y ago48 comments

48 comments

36 comments · 14 top-level

dannykwells6y ago· 4 in thread

Working data scientist here. As many have said, this is, effectively a Kaggle challenge. I mean honestly at this point, I don't care, at all, how well someone can predict anything to be a data scientist - there is very little correlation between that and how good of a data scientist they are.

Tools to hire data scientists and going to continually fail until they realize that the interesting, hard part of being a data scientist is closer akin to a business lead (which can't really be tested in 60 minutes).

Concrete feedback:

- You ask for writing and descriptions on why a model was chosen, why features matters - are you grading this automatically? That would be a feat.

- The task is waaay to easy (even if you do believe there is a market for identifying people who can predict well).

- Python is overly limited. Why not SQL or R?

minimaxir6y ago

Disclosure: Got a preview of this product, my opinions only.

> You ask for writing and descriptions on why a model was chosen, why features matters - are you grading this automatically? That would be a feat.

Grading is apparently not automatic, which is good as I am not a fan of the Kaggle approach in this demo.

> The task is waaay to easy (even if you do believe there is a market for identifying people who can predict well)

You'd be very surprised about how candidates can respond to these types of questions!

> Python is overly limited. Why not SQL or R?

The full product allows Python, R, and Julia, with popular packages preinstalled for Python/R.

okl6y ago

Isn't the most important part weeding out the most unqualified applicants? For that purpose such a test might be fine.

shikharja6y ago

Thank you for the feedback!

- We are not automatically grading it. We have learned that in the past that trying to automatically grade candidates on such challenges biases their approach, which breaks the point of a good data science challenge.

- that's good to know. we are not really focusing on the final outcome but how creatively a candidate can go about the problem. the dataset allows for some good amount of creativity.

- Ah. can you elaborate what you mean by overly limited? We do support R.

iamwil6y ago

What is a business lead?

soVeryTired6y ago· 3 in thread

What you're asking for looks a lot like a kaggle challenge. You're asking if someone can use xgboost (or linear regression). It's the last ten per cent of a data science problem, if that.

The hard parts of data science include the following:

  - choosing the right input data (rather relying on regularisation)
  - figuring out what the consequences are if you're wrong in a specific way, and avoiding the bad cases
  - wrangling your data into a nice CSV format
  - handling missing data
  - spotting biases in your data collection methodology

I'd expect a graduate to know about regression. For anyone else, this wouldn't help me assess their skills.

anilgulecha6y ago

Correct, soVeryTired.

This ShowHN was to showcase the platform. Technically any csv/datasets/notebooks can be loaded in, and candidates for interviews can be asked to do any of the things you listed. The challenge you took was to showcase one specific example (regression)

shikharja6y ago

@soVeryTired, the challenge showcased in the test does involve a lot of wrangling, handling missing data points, spotting bias and identifying the right features for the regression model. The challenge is designed to allow for a candidate's creativity.

Do you think the data set we have used doesn't do it to the extent you'd expect from a data scientists?

tfehring6y ago

Any assessment that directly provides data sets - even with "gotcha"s like missing values - is testing, based on the conventional wisdom, at most 20% of a real-world data science workflow. And IMO it's the least critical 20%.

The only good end-to-end "technical" data science assessment I can think of is to pose a broad question or business problem that's addressable by applying data science techniques to publicly available data. But a nontrivial version of that assessment would take half a day on the very low end, and long assessments anti-select against good candidates.

IMO, when it comes to evaluating data scientists, the only thing that online coding assessments are good for is to ensure that they can perform basic coding and data manipulation tasks. (I'd include tasks like web scraping, image manipulation, API calls, and ORM stuff in this category). Everything else needs to be evaluated in person.

2 more replies

morelandjs6y ago· 3 in thread

No one does good science in 60 minutes.

minimaxir6y ago

True. But as long as all candidates have the same time limit / same expectation of work depth, and the test providers have a reasonable expectation of how much can be accomplished in that timeframe, then it's fair.

That said, this demo should have a several hour time limit.

listenallyall6y ago

A standardized, precise 40-yard dash might be fair. But it is also pretty useless if you are evaluating runners for a 1 mile race, or a marathon.

1 more reply

shikharja6y ago

That is correct. We had to reduce the time duration to be able to handle the traffic. We recommend 90mins for this challenge. What do you think would the right amount of time for this problem?

rvivekOP6y ago· 3 in thread

Hello folks, would love your feedback on our new product to assess data scientists.

ska6y ago

I have a number of concerns about the efficacy of this, but they are made more difficult to rank by not understanding how you are planning to evaluate and use the results.

Can you elaborate?

anilgulecha6y ago

Evaluation is subjective at the moment, by a review of the jupiter session by the hiring managers.

For certain data science usecases, evaluation is possible by using a CSV output bu a user, and comparing that to an expected CSV.

(I worked on the product).

1 more reply

shikharja6y ago

ska, our aim with the challenge was to allow candidates to not be biased by a fixed outcome and try to solve the problem as they would solve any real data science problem.

This meant we couldn't automatically score/rank a candidate's solution. We do provide them with an evaluation metric in the problem description (Mean Absolute Error). Here is scoring rubric we provide to the interviewers when they review the submission - https://d.pr/i/hNYY0u

Would love to hear more opinions on our scoring rubric

1 more reply

data4lyfe6y ago· 2 in thread

I am in the camp to think that this notebook judges data scientists in a way that soon will be obsolete.

If I'm given this clean dataset with all of the features properly set in columns and data types labeled, I could spin up Azure or Google Cloud's ML capabilities and have them run gridsearch and optimize my model.

To test data scientists, it seems like it's generally falling more into the buckets of people that can pull analytics and query databases to create the datasets and features OR people that can build the infrastructure to serve models, engineer pipelines, etc...

FYI though we're working on this now at https://www.interviewquery.com to try to start creating suitable tests to assess data scientists without having them do 10+ take homes every month.

minimaxir6y ago

> If I'm given this clean dataset with all of the features properly set in columns and data types labeled, I could spin up Azure or Google Cloud's ML capabilities and have them run gridsearch and optimize my model.

That's the fault of the test design allowing such techniques without scrutiny, not with the Notebook format.

1 more reply

enahs-sf6y ago

Gathering data and doing analysis seems like a much better skill set to look for than can this person press run on this model. I hope you guys can figure it out; interviewing in the software industry is broken.

ryanferg6y ago· 2 in thread

I'm a data scientist (for an MLB team that will win the WS this year!) and I love this. Of course this isn't a whole end to end evaluation platform. But we will get 300-500 applications for a position sometimes, and often folks have no business applying and this would be a great way to filter out some of the noise. Great job!

shikharja6y ago

That's great to hear Ryan! You can sign up for the free trial for a full experience here - https://www.hackerrank.com/products/free-trial.

shikharja6y ago

If would like to see more Data Science questions (not available in the free trial), I'd be happy to give you a demo. Let me know how to reach you.

tryitnow6y ago· 2 in thread

I think this is great as a self-assessment tool, especially for beginners.

It would work great with other learning tools, like MOOCs, datacamp, dataquest.io, as part of an overall data science learning process.

I'm more skeptical of its ability to help companies select candidates, but I could be very wrong about this and if I am then it's a huge win, so thanks for developing it.

I am super interested in seeing how you all develop this in the future,there's a lot of potential here. Is there a data science specific mailing list I can sign up for? I honestly, have zero interest in hiring for other roles so I am not going to sign up for a general mailing list.

shikharja6y ago

If you are interested, we are looking for data scientists and developers in general from the community to help us built these solutions and provide us with honest feedback. We are also looking at building support for other Data Science roles like a Data Engineer. I would be more than happy to show you what we have and hear your thoughts on the same. Let me know if you'd like to be a part of it and how can I reach out.

anilgulecha6y ago

There isn't a specific mailing list. However you can sign up for a one off webinar that will go into a lot of detail on the hiring manager flow. It's at the bottom of the launch blog post.

https://blog.hackerrank.com/hackerrank-projects-supports-dat...

ska6y ago· 2 in thread

Here are my high level thoughts after a quick look at the question and some clarification in thread below.

- a single question is difficult to evaluate. "Answering a business question" is at the very end, usually, of a bunch of exploratory steps

- 60 min is reasonable but not much time to evaluate real work. You either need to expand the time (also a problem, for interviewing) or allow scoring of "what I'd do next"

- tooling familiarity is going to be a huge factor with short time. Are you testing general knowledge or environment knowledge?

- too focused on models, too "kaggle-like". That covers about 20% of the skills and job.

Here are the sorts of things I look for. Do they understand:

1. How to verify & validate data, clean inputs, handle coding errors and ELT type issues

2. How to evaluate data set issues like bias, missing data and outliers, and account for that (and when you can't)

3. (situational) How their infrastructure works an what they need it to do (e.g. for distributed training, if appropriate). How to use it effectively.

4. How to control data and code throughout lifecycle, so you don't waste time and experiments

5. How to choose between approaches and models

6. How to evaluate performance rigorously

7. How to monitor performance over time

but here is the kicker

8. How do you know you are trying to solve the right problem?

For junior people, the emphasis will be on the earlier points. For senior people the last point is key.

Your question partially addresses some of the early points only.

Off the top of my head suggestions.

- Have separate stages. Cleanup & verification can have objective and subjective issues (missing & corrupt data? Outliers?)

- Don't focus too much on modeling, it's the least interesting part.

- Allow different toolsets possibly (e.g. R)

- Initial cleanup/eval stage on a CSV, but following stage pull from SQL?

- Possibly allow multiple inference choices from same or a few data sets. Give a short list of things the "business" is interested in, they pick and describe why

- good idea to focus a bit on producing one/two graphics/tables to communicate to a lay audience.

- more focus on verification

- add a validation discussion requirement. How are you going to know what you did is worth doing?

- add a "next things I would try/do"

The latter is going to be text heavy but no way to avoid this unless there is a follow on voice/personal interview.

There isn't any way you are going to auto score this stuff reliably, so that's probably ok. Consequence is your evaluators are going to actually have to be good at this.

shikharja6y ago

Thank you ska. This is pretty insightful, and actually makes sense. We will try to incorporate your suggestions as we create more challenges. We are also looking for more data scientists and developers in general to help us built these solutions, review them and share honest feedback. Would you like to work us on the same? I would love to hear your thoughts on the new features we are building for more Data Science roles. If yes, let me know how to reach out to you.

ska6y ago

Happy to discuss that, I have a lot of related experience that might help you. Where can I reach you by email?

sireat6y ago· 1 in thread

I suppose I am in the minority but I thought it was a pretty good FizzBuzz challenge for DS.

In fact I'd say it is a bit aggressive for a 60 minute challenge.

Quite a bit of data wrangling is expected to complete modeling on all columns. Some regex knowledge would help here too (for example for wrangling internet_users column)

What was the idea behind asking for 20 most important features when we have 16 columns? Is it expected to do some feature engineering?

Disclaimer: I teach Python and basic Data Science to adults and I'd say most people would struggle to complete this in 60 minutes including myself.

shikharja6y ago

We had to reduce the time limit on the test to handle the traffic. The intended test duration is 90-120mins. I have updated the test duration to 90mins now.

There is indeed some feature engineering involved. The challenge in the test can indeed be solved in the most obvious way possible, as well as in the most creative fashion. We believe how a Data Scientist goes about solving the problem was more important than a fixed outcome.

pequalsnp6y ago

This was pretty cool. For fun, I tried to get the best possible score I could, using XGBoost, without any feature engineering and achieved a MAE of 0.042422154541399665.

b_tterc_p6y ago

I think this is good as a way to filter people out, but not as a way to rank people to find the best.

I would want to see a short script to clean and predict a dataset, plus a small description of why choices were made.

Wouldn’t care much about the performance of the model.

kthejoker26y ago

Data scientists are fundamentally problem solvers.

The best way to assess technical problem solving is a structured hackathon. That is, to be given

* a problem with multiple subproblems and solution milestones

* with both objective and subjective criteria

* freedom of tools

* a "junkyard" of resources

* a fixed amount of time for each deliverable of the problem

And then you observe the process and the results.

For data science, the subproblems should be:

* requirements gathering / understanding the problem

* data acquisition, prep, and analysis

* refinement of requirements / communication

* feature engineering

* modeling

* presentation / storytelling / viz

PLenz6y ago

Data scientists (in my opinion as one) should be spending most of our time listening and talking to our colleagues, our clients, our peers, teachers, and - last of all- to the data. Last because we dive into the data searching for things, answers to questions, people asked for and that other people inform our journey and our methods.

Counter-intuitive in regards to the phrase but good ds is people work first and data work like 9th.

tzm6y ago

This test is a relative assessment that tests the employer more than the employee.

j / k navigate · click thread line to collapse

48 comments

36 comments · 14 top-level

dannykwells6y ago· 4 in thread

Concrete feedback:

- You ask for writing and descriptions on why a model was chosen, why features matters - are you grading this automatically? That would be a feat.

- The task is waaay to easy (even if you do believe there is a market for identifying people who can predict well).

- Python is overly limited. Why not SQL or R?

minimaxir6y ago

Disclosure: Got a preview of this product, my opinions only.

> You ask for writing and descriptions on why a model was chosen, why features matters - are you grading this automatically? That would be a feat.

Grading is apparently not automatic, which is good as I am not a fan of the Kaggle approach in this demo.

> The task is waaay to easy (even if you do believe there is a market for identifying people who can predict well)

You'd be very surprised about how candidates can respond to these types of questions!

> Python is overly limited. Why not SQL or R?

The full product allows Python, R, and Julia, with popular packages preinstalled for Python/R.

okl6y ago

Isn't the most important part weeding out the most unqualified applicants? For that purpose such a test might be fine.

shikharja6y ago

Thank you for the feedback!

- that's good to know. we are not really focusing on the final outcome but how creatively a candidate can go about the problem. the dataset allows for some good amount of creativity.

- Ah. can you elaborate what you mean by overly limited? We do support R.

iamwil6y ago

What is a business lead?

soVeryTired6y ago· 3 in thread

What you're asking for looks a lot like a kaggle challenge. You're asking if someone can use xgboost (or linear regression). It's the last ten per cent of a data science problem, if that.

The hard parts of data science include the following:

  - choosing the right input data (rather relying on regularisation)
  - figuring out what the consequences are if you're wrong in a specific way, and avoiding the bad cases
  - wrangling your data into a nice CSV format
  - handling missing data
  - spotting biases in your data collection methodology