undefined | Better HN

0 pointsdisgruntledphd25y ago0 comments

It's called statistics, to be fair.

In general, if your DS's can't explain the model then something has gone horribly wrong.

0 comments

9 comments · 2 top-level

fxtentacle5y ago· 5 in thread

People usually just call it AI then and pretend that it's not a problem.

For example, there's been plenty of issues like racial bias with the computer vision algorithms that police uses, which effectively is data science on pictures. But nobody knows why the issue occurs and nobody can specifically fix it without the risk of breaking a thousand unrelated other things.

shakna5y ago

> But nobody knows why the issue occurs

Sure we do. Photography itself has a racial bias. [0] Different skin tones produce different levels of detail, and it has been an uphill battle to be able to capture those details since the advent of photography.

So long as facial recognition relies on photography, and photography is flawed, then every dataset is biased. Which will exacerbate the bias of the AI, which already has its own set of problems from where those datasets are formed.

[0] https://www.nytimes.com/2019/04/25/lens/sarah-lewis-racial-b...

ZephyrBlu5y ago

> But nobody knows why the issue occurs

They do know why that occurs. It's because the data set is biased.

tremon5y ago

No, you don't "know" that your dataset is biased until you perform the statistic analysis explicitly. It might be that your neural net has a non-uniform weight distribution in some dimension (e.g. in time, or in the ordering of the training data), so dismissing any unwanted results by claiming "your dataset is biased" is a form of appeal to (artificial) authority.

1 more reply

jokethrowaway5y ago

Is it a case of BAME have less stable families, are poorer, commit more crimes and therefore are more represented in the data which lead to even more incarcerations?

1 more reply

alexashka5y ago

Funny, I remember thinking the questions were biased, every time they weren't the ones I studied to answer in my exam preparations.

Too bad I wasn't a data scientist or else I could just get a passing grade by claiming the questions were chosen from a biased data set, or retake the exam until the data set matched the questions I studied for, at which point the data set would no longer be biased, lol.

Funny line of work, this data 'science' where you only use the results that fit the narrative you wanted in the first place.

We're in full doublethink mode, just keep repeating data 'science', 'science', 'science'. :)

1 more reply

marcinzm5y ago· 2 in thread

>In general, if your DS's can't explain the model then something has gone horribly wrong.

With modern ML models data scientists can generally explain why the model should work but not exactly how it gives any specific prediction. Explaining how a model with a trillion parameters comes up with an answer is not a trivial task.

disgruntledphd2OP5y ago

> modern ML models

As an aside, this is a hilarious phrasing. What are we going to call these methods in a decade? You'd probably be better off phrasing it as deep neural networks.

Anyway, this is not really true. There are methods (for lots see: https://christophm.github.io/interpretable-ml-book/) and the DeepDream paper came out in 2015, so it's possible.

It's computationally expensive, and a lot of people don't see the value, but my argument is that if you want to use the model in the real world, and have non-technical stakeholders then you'll need to do this, and in general, I've found it to be the best way to actually improve a model.

And to be fair, if you just need to see how the predictions vary as a function of the inputs, you can again hold all but one constant and run a bunch of different values through the model.

Again, this can all be done, but I think it's more a question of will than capability (and hence the context of my original commment).

marcinzm5y ago

>You'd probably be better off phrasing it as deep neural networks.

I phrased it the way I did for a reason. Large Random Forrest models are also not easily interpretable. Even large logistic regression models with feature interactions and feature hashing aren't easily interpretable. It's not a question of the model technique used but rather the amount of parameters and how many feature interactions are modeled.

>And to be fair, if you just need to see how the predictions vary as a function of the inputs, you can again hold all but one constant and run a bunch of different values through the model.

This only provides a partial view of the feature impact due to non-linear interactions.

j / k navigate · click thread line to collapse

0 comments

9 comments · 2 top-level

fxtentacle5y ago· 5 in thread

People usually just call it AI then and pretend that it's not a problem.

shakna5y ago

> But nobody knows why the issue occurs

[0] https://www.nytimes.com/2019/04/25/lens/sarah-lewis-racial-b...

ZephyrBlu5y ago

> But nobody knows why the issue occurs

They do know why that occurs. It's because the data set is biased.

tremon5y ago

1 more reply

jokethrowaway5y ago

Is it a case of BAME have less stable families, are poorer, commit more crimes and therefore are more represented in the data which lead to even more incarcerations?

1 more reply

alexashka5y ago

Funny, I remember thinking the questions were biased, every time they weren't the ones I studied to answer in my exam preparations.

Funny line of work, this data 'science' where you only use the results that fit the narrative you wanted in the first place.

We're in full doublethink mode, just keep repeating data 'science', 'science', 'science'. :)

1 more reply

marcinzm5y ago· 2 in thread

>In general, if your DS's can't explain the model then something has gone horribly wrong.

disgruntledphd2OP5y ago

> modern ML models

As an aside, this is a hilarious phrasing. What are we going to call these methods in a decade? You'd probably be better off phrasing it as deep neural networks.

Anyway, this is not really true. There are methods (for lots see: https://christophm.github.io/interpretable-ml-book/) and the DeepDream paper came out in 2015, so it's possible.

And to be fair, if you just need to see how the predictions vary as a function of the inputs, you can again hold all but one constant and run a bunch of different values through the model.

Again, this can all be done, but I think it's more a question of will than capability (and hence the context of my original commment).

marcinzm5y ago

>You'd probably be better off phrasing it as deep neural networks.

>And to be fair, if you just need to see how the predictions vary as a function of the inputs, you can again hold all but one constant and run a bunch of different values through the model.

This only provides a partial view of the feature impact due to non-linear interactions.

j / k navigate · click thread line to collapse