undefined | Better HN

0 pointssrean2mo ago0 comments

None of those (but they could be added to the mix to complicate matters).

Consider the case that the labelers creates the labelled training set by cherry picking those examples that are easy to label. He labels many, but selects the items to label according to his preference.

First question, is this even a problem. Yes, most likely. But why ? How to fix it ? When are such fixes even possible.

0 comments

2 comments · 1 top-level

p1esk2mo ago· 1 in thread

Yes, this is a problem - the most challenging samples might not even be present in your training data. This means your model will not perform well if real world data has lots of challenging samples.

This can be partially solved if we make some assumptions about your labeller:

1. they have still picked enough challenging samples.

2. their preferences are still based on features you care about.

3. he labelled the challenging samples correctly.

And probably some other assumptions should hold for distribution of labels, etc. But what we can do in this situation is first try to model that labeller preferences, by training a binary classifier - how likely he would choose this sample for labelling from the real-world distribution? If we train that classifier, we can then assign its confidence as a sample weight when preparing our training dataset (less likely samples get more weight). This would force our main classifier to pay more attention to the challenging samples during training.

This could help somewhat if all assumptions hold, but in practice I would not expect much improvement, and the solution above can easily make it worse - this problem needs to be solved by better labelling.

How did you solve it?

sreanOP2mo ago

By using the (estimated) Radon Nikodym derivative between the the two measures -- the measure from which the labelers samples and the deployed to measure from which the on-deployment items are presumably sampled.

For this to work the two measures need to be absolutely continuous with each other.

This is close to your pre-penultimate paragraph and that's mathy enough. This done right can take care of bias but may do so at the expense of variance, so this Radon Nikodym derivative that is estimated needs to be done so under appropriate regularization in the function space.

Thinking of the solution in these terms requires mathematical thinking.

Now let's consider the case where some features may be missing on instances at the time of deployment but always present in training and the features are uncorrelated with each other (by construction).

j / k navigate · click thread line to collapse